Adding Semantic Search to Your Postgres App with pgvector

pgvector is a Postgres extension that adds vector storage and similarity search to an existing database, so you can run semantic queries directly against your application data without standing up a separate vector store. If you're already on Postgres, you can enable it with one CREATE EXTENSION statement, add a vector column to any table, and have semantic search returning results the same day.

This post walks through adding it to an existing app — from installing the extension to running your first semantic query, with an HNSW index for performance at scale.

TL;DR

What it is: A Postgres extension that adds a vector column type and similarity-search operators (<=>, <->, <#>).
Why it matters: Semantic search without a separate vector database, hybrid keyword-and-semantic queries in one SQL statement, and no new service to operate.
Five steps to ship it: Install the extension, add a vector(N) column, embed at write time, query with cosine similarity, add an HNSW index for scale.
Embedding cost: ~$0.02 per million tokens with text-embedding-3-small. Ollama runs embedding models locally for free if you'd rather not depend on a provider.
When to upgrade beyond pgvector: Tens of millions of vectors with sub-50ms latency requirements. Below that, pgvector is plenty.

What's the difference between keyword search and semantic search?

Keyword search finds exact matches. If a user searches "cholesterol prescription" and your record says "lipid panel results," they get nothing.

Semantic search finds meaning. It understands that "cholesterol prescription" and "lipid panel results" are related concepts, and surfaces the right record even without a word match.

That's what vector embeddings buy you. Instead of storing text, you store a numerical representation of what that text means. Search becomes a question of mathematical similarity rather than string matching.

Step 1: Enable the extension

If you're running Postgres locally or in Docker, install pgvector first:

# Ubuntu / Debian
sudo apt install postgresql-16-pgvector

# or via Docker — use the pgvector image instead of plain postgres
# docker pull pgvector/pgvector:pg16

Then enable it in your database:

CREATE EXTENSION IF NOT EXISTS vector;

That's it. No separate service, no new connection string.

Step 2: Add an embedding column to your table

Pick whichever table holds the content you want to make searchable. Add a vector column — the dimension count needs to match the embedding model you'll use.

OpenAI's text-embedding-3-small outputs 1536 dimensions:

ALTER TABLE documents ADD COLUMN embedding vector(1536);

If you use a different model, check its output dimension and use that number instead. The dimension has to be consistent — you can't mix embeddings from different models in the same column.

Step 3: Generate embeddings when content is saved

Whenever a record is created or updated, generate an embedding from its text content and store it. Here's a Node.js example using the OpenAI SDK:

import OpenAI from "openai";
const openai = new OpenAI();

async function generateEmbedding(text) {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: text,
  });
  return response.data[0].embedding;
}

async function saveDocument(db, doc) {
  // Build a text representation of what you want to be searchable
  const textToEmbed = `${doc.title} ${doc.tags.join(" ")} ${doc.content}`;
  const embedding = await generateEmbedding(textToEmbed);

  await db.query(
    `INSERT INTO documents (title, content, tags, embedding)
     VALUES ($1, $2, $3, $4)`,
    [doc.title, doc.content, doc.tags, JSON.stringify(embedding)]
  );
}

A few things worth noting here:

What you embed matters. Concatenating title, tags, and content into one string gives the model more signal than just the raw content. Experiment with what makes your search results feel right.
Embed at write time, not search time. Pre-computing embeddings keeps search fast. You don't want to embed on every query.
If you have existing records, run a backfill script to generate embeddings for everything already in the database before you go live.

Step 4: Search with cosine similarity

When a user submits a search query, embed it the same way you embedded your content, then find the closest matches:

async function semanticSearch(db, query, limit = 10) {
  const queryEmbedding = await generateEmbedding(query);

  const result = await db.query(
    `SELECT id, title, content,
            1 - (embedding <=> $1) AS similarity
     FROM documents
     ORDER BY embedding <=> $1
     LIMIT $2`,
    [JSON.stringify(queryEmbedding), limit]
  );

  return result.rows;
}

The <=> operator is cosine distance — lower means more similar. The 1 - (embedding <=> $1) gives you a similarity score between 0 and 1 if you want to display or filter by confidence.

Step 5: Add an index for performance

Without an index, Postgres does an exact nearest-neighbor scan across every row — fine for small tables, slow for large ones. Add an HNSW index to keep queries fast at scale:

CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops);

HNSW (Hierarchical Navigable Small World) is an approximate nearest-neighbor algorithm. It trades a tiny amount of recall accuracy for a large speed gain. For most applications the tradeoff is well worth it.

Putting it together

Here's what the full flow looks like:

User saves a document → you generate an embedding → store it in the embedding column
User searches → you embed the query → run cosine similarity against stored embeddings → return top matches
Results feel like the app actually understands what the user is looking for

A few things to keep in mind

Embedding cost is low but not zero. OpenAI's text-embedding-3-small is cheap — around $0.02 per million tokens — but it adds up at scale. If you're embedding large documents frequently, keep an eye on usage.

Local embeddings are an option. If you want to keep everything in-house, Ollama can run embedding models locally. The quality varies by model, but for many use cases it's more than good enough and costs nothing per query.

Hybrid search is often better. Semantic search alone can miss exact matches that keyword search would catch. For production apps, consider combining both — run a keyword search with tsvector and a vector search with pgvector, then merge and rank the results. This is sometimes called hybrid search or reciprocal rank fusion.

Chunking matters for long documents. Embedding a 10,000-word document as a single vector loses a lot of nuance. For long content, chunk it into paragraphs or sections, embed each chunk separately, and link chunks back to the parent document.

pgvector is one of those things that looks complicated from the outside but is surprisingly approachable once you start. If you're already on Postgres, there's no reason not to have it.

FAQ

Do I need a separate vector database if I'm already using Postgres?

For most apps, no. pgvector handles tens of millions of vectors comfortably with an HNSW index, and you keep the operational simplicity of one database. You'd reach for a dedicated vector store (Pinecone, Weaviate, Qdrant, Milvus) only when you need extreme scale, very low latency, or specialized features like hybrid sparse/dense indexing that pgvector doesn't cover.

Which embedding model should I use with pgvector?

For most production use cases, OpenAI's text-embedding-3-small (1536 dims) is the default — cheap, fast, and high quality. Use text-embedding-3-large (3072 dims) if you need more accuracy and can pay for it. For local/private deployments, Ollama running nomic-embed-text or mxbai-embed-large is a solid choice. The dimension number in your column type has to match the model.

What's the difference between HNSW and IVFFlat indexes?

HNSW is faster to query and gives better recall, but takes longer to build and uses more memory. IVFFlat is faster to build, lighter on memory, but slower to query and less accurate. For most production workloads, HNSW is the right default. IVFFlat is fine if you're indexing very large datasets infrequently and care about build time.

Should I use cosine, L2, or inner product distance?

Cosine distance (<=>) is the right default for text embeddings — it ignores vector magnitude and only compares direction, which matches how text embedding models are trained. Use L2 (<->) for image embeddings or anything where magnitude carries meaning. Inner product (<#>) is fastest when your vectors are normalized but identical to cosine in that case.

Do I need to re-embed when I update a record?

Only if the text you embedded changed. The cleanest pattern is to embed a derived "search text" string (title + tags + content), and re-embed whenever any of those source fields change. A trigger or BEFORE UPDATE hook keeps it in sync.

Can I combine semantic search with regular SQL filters?

Yes — that's one of pgvector's biggest advantages. You can WHERE user_id = $1 AND status = 'active' ORDER BY embedding <=> $2 LIMIT 10 and get filtered semantic search in one query. With a separate vector store, you'd have to filter in two places and reconcile.