Back to work
2025·AI · Developer ToolsBETA

HELIX.ai

An AI copilot that understands your codebase.

Context-aware AI pair programmer. Indexes your repo into a semantic graph, retrieves relevant code at query time, and produces suggestions with full file-level context — not just window snippets.

Duration
9 weeks
Team
Solo + design contractor
My role
Solo full-stack + ML
Outcome
Active developers
12.4k
+47% MoM
Prompts per day
189k
P50 latency
340ms
Token cost saved
62%
The story

From brief to production system.

Challenge

Off-the-shelf AI assistants miss codebase context. They hallucinate APIs, misuse internal patterns, and treat each prompt as cold. Engineers waste cycles correcting LLM output instead of writing code.

Solution

Hybrid retrieval: AST-aware chunking via Tree-sitter + pgvector semantic search + symbol graph traversal. Claude 4.6 routes through a custom prompt cache that hits 78% of the time. Streamed responses via Server-Sent Events. VSCode + JetBrains plugins.

Outcome

Beta users report 3.4x faster feature delivery on legacy code. 62% reduction in token cost vs naive RAG. Onboarding time for new engineers cut from 2 weeks to 4 days. Currently in private beta with 14 design partners.

Process · 9 weeks

How it shipped, week by week.

Week 1
01 / 5

Retrieval research

Spent a week reading prior art on code retrieval. Decided on hybrid AST + semantic + symbol graph after benchmarking three approaches on a fixed eval set.

Week 2-3
02 / 5

Indexer + storage

Built the Tree-sitter indexer. Settled on pgvector with HNSW after benching FAISS, Pinecone, and pgvector. pgvector won on operational simplicity at our scale.

Week 4-6
03 / 5

API + caching layer

FastAPI gateway with prompt cache. Tuned cache key strategy for Anthropic's 5-min TTL. Hit rate climbed from 12% to 78% over two weeks of tuning.

Week 7-8
04 / 5

Editor plugins

VSCode plugin built on the LSP. JetBrains plugin via their platform. Streamed SSE for sub-second time-to-first-token.

Week 9
05 / 5

Beta launch

Onboarded 14 design partners. Set up usage telemetry and error tracking. Iterating weekly based on partner feedback.

Inside the system

What it does. How it's built.

Features

  • AST-aware repo indexing via Tree-sitter (12 languages)
  • Symbol graph + dependency map for cross-file reasoning
  • Semantic search via pgvector (HNSW index)
  • Prompt caching aligned with Anthropic's 5-min TTL
  • VSCode + JetBrains plugins
  • Streaming SSE responses for sub-second time-to-first-token
  • Per-team prompt templates
  • Local-first index option (no code leaves your machine)

Architecture

  • 01FastAPI gateway with Pydantic validation
  • 02Tree-sitter parsers for 12 languages (Python, TS, Go, Rust, Java, ...)
  • 03PostgreSQL + pgvector for embeddings (HNSW for fast recall)
  • 04Redis for prompt cache + session state
  • 05Claude API: Sonnet for code, Haiku for routing decisions
  • 06React + TanStack Query frontend
  • 07VSCode extension built on the LSP
  • 08Local indexer in Rust for the local-first option
Stack
PythonFastAPIReactPostgreSQLpgvectorClaude APIRedisTree-sitterServer-Sent Events
From the codebase

Annotated excerpts.

01 · Hybrid retrieval: semantic search + symbol-graph expansion + reranking.
core/retrieval.pypython
async def retrieve_context(
    query: str,
    repo: RepoIndex,
    k: int = 12,
) -> list[CodeChunk]:
    embedding = await embed(query)

    # Vector search across chunks
    semantic = await repo.vector_search(embedding, k=k * 2)

    # Walk symbol graph from top hits
    graph_hits = await repo.expand_symbols(
        seeds=[h.symbol_id for h in semantic[:5]],
        depth=2,
    )

    # Rerank with cross-encoder
    merged = dedupe(semantic + graph_hits)
    reranked = await rerank(query, merged)

    return reranked[:k]
02 · Prompt cache aligned with Anthropic's 5-min TTL.
core/cache.pypython
class PromptCache:
    """Cache aligned with Anthropic's 5-minute TTL."""

    TTL_SECONDS = 5 * 60

    async def get_or_compute(
        self,
        key: PromptKey,
        compute: Callable[[], Awaitable[Response]],
    ) -> CachedResponse:
        cached = await self.redis.get(key.fingerprint)
        if cached and not self._stale(cached):
            return CachedResponse(response=cached, cache_hit=True)

        response = await compute()
        await self.redis.setex(
            key.fingerprint,
            self.TTL_SECONDS,
            response.model_dump_json(),
        )
        return CachedResponse(response=response, cache_hit=False)
What the client said
He integrated Claude into our editor and the cache hit rate is sitting at 78%. Token bill dropped by more than half month-over-month. Excellent communicator — async-friendly, no theatrics, just delivery.
DO
Daniel Ortiz
Product Lead · Stitch.io
Other projects

Continue browsing

Have a project like this in mind? Let's talk.

Send me a brief and I'll respond within 24 hours.

← Home© 2025 Ali RazzaqContact →