Engineering·Apr 22, 2026·9 min read

Scaling Postgres with pgvector — what we learned at 2M embeddings

Lessons from running a production HNSW vector index on Postgres for code search at scale. Recall, latency, and the operational tradeoffs nobody warns you about.

Ali Razzaq

Full-Stack Developer · Lahore, Pakistan

When we shipped HELIX.ai's semantic code search, we benched three vector stores: FAISS, Pinecone, and Postgres + pgvector. We expected pgvector to lose. It didn't — but only after we tuned three things almost nobody mentions in introductory blog posts.

Why we picked Postgres

Operational simplicity. We were already running Postgres for everything else — auth, billing, repo metadata. Adding a managed vector service meant another deploy target, another set of secrets, another paging surface. pgvector keeps embeddings inside the same transaction boundary as the rest of the data, which is more useful than it sounds.

The first surprise: HNSW vs IVFFlat

pgvector ships two index types. We started with IVFFlat because every tutorial uses it. At 200k embeddings, recall was 92% at acceptable latency. At 2M, recall fell off a cliff — to ~78%. HNSW solved it, but at a cost: index build times went from minutes to over an hour, and memory usage roughly doubled.

sql

-- IVFFlat: fast to build, weaker at scale
CREATE INDEX ON embeddings USING ivfflat (vector vector_cosine_ops)
WITH (lists = 1000);

-- HNSW: slow to build, far better recall
CREATE INDEX ON embeddings USING hnsw (vector vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

The numbers that matter: at 2M rows, HNSW gave us 96% recall@10 with p95 query latency of 38ms. IVFFlat at the same configuration gave 78% recall and 71ms p95.

The second surprise: ef_search at query time

HNSW has a runtime parameter — ef_search — that's invisible to most ORMs. The default of 40 was fine for our top-5 queries but tanked recall on top-50. Setting it per-query was the fix.

python

# Set ef_search per session, not at index creation
async def vector_search(query: str, k: int):
    async with pool.acquire() as conn:
        # Tune for the result size we want
        ef = max(40, k * 4)
        await conn.execute(f"SET LOCAL hnsw.ef_search = {ef}")

        return await conn.fetch("""
            SELECT chunk_id, content, 1 - (vector <=> $1) AS score
            FROM embeddings
            ORDER BY vector <=> $1
            LIMIT $2
        """, query_embedding, k)

The third surprise: VACUUM and bloat

We were rebuilding embeddings nightly as the indexed corpus grew. Within a week, query latency degraded by 40%. Postgres' MVCC means deleted tuples stay around until VACUUM cleans them up — and HNSW indexes are sensitive to bloat.

Run `VACUUM ANALYZE` after any large delete or bulk update
Schedule REINDEX on the HNSW index weekly during low-traffic windows
Monitor pg_stat_user_indexes for idx_scan vs n_dead_tup ratio

What I'd do differently

If I started over, I'd commit to HNSW from day one and skip the IVFFlat phase entirely. The index build cost is real but pays for itself within a week of production use. I'd also bake a small reranker on top — semantic search alone gets you 80% of the way; a cross-encoder rerank closes the rest.

"Vector search is mostly about systems engineering. The model picks the candidates; the database determines whether the system can be operated by humans on call."

PostgrespgvectorVector searchPerformance

How I cut a startup's AWS Lambda bill by 67%

→

A practical walkthrough of the four biggest cost wins on serverless: right-sizing memory, ARM architecture, provisioned concurrency, and the invocation patterns nobody profiles.

← Home All writing Contact me →