2025·AI · Developer ToolsBETA

HELIX.ai

An AI copilot that understands your codebase.

Context-aware AI pair programmer. Indexes your repo into a semantic graph, retrieves relevant code at query time, and produces suggestions with full file-level context — not just window snippets.

Duration

9 weeks

Team

Solo + design contractor

My role

Solo full-stack + ML

Outcome

Active developers

12.4k

+47% MoM

Prompts per day

189k

P50 latency

340ms

Token cost saved

62%

The story

From brief to production system.

Challenge

Off-the-shelf AI assistants miss codebase context. They hallucinate APIs, misuse internal patterns, and treat each prompt as cold. Engineers waste cycles correcting LLM output instead of writing code.

Solution

Hybrid retrieval: AST-aware chunking via Tree-sitter + pgvector semantic search + symbol graph traversal. Claude 4.6 routes through a custom prompt cache that hits 78% of the time. Streamed responses via Server-Sent Events. VSCode + JetBrains plugins.

Outcome

Beta users report 3.4x faster feature delivery on legacy code. 62% reduction in token cost vs naive RAG. Onboarding time for new engineers cut from 2 weeks to 4 days. Currently in private beta with 14 design partners.

Process · 9 weeks

How it shipped, week by week.

Week 1

01 / 5

Retrieval research

Spent a week reading prior art on code retrieval. Decided on hybrid AST + semantic + symbol graph after benchmarking three approaches on a fixed eval set.

Week 2-3

02 / 5

Indexer + storage

Built the Tree-sitter indexer. Settled on pgvector with HNSW after benching FAISS, Pinecone, and pgvector. pgvector won on operational simplicity at our scale.

Week 4-6

03 / 5

API + caching layer

FastAPI gateway with prompt cache. Tuned cache key strategy for Anthropic's 5-min TTL. Hit rate climbed from 12% to 78% over two weeks of tuning.

Week 7-8

04 / 5

Editor plugins

VSCode plugin built on the LSP. JetBrains plugin via their platform. Streamed SSE for sub-second time-to-first-token.

Week 9

05 / 5

Beta launch

Onboarded 14 design partners. Set up usage telemetry and error tracking. Iterating weekly based on partner feedback.

Inside the system

What it does. How it's built.

Features

AST-aware repo indexing via Tree-sitter (12 languages)
Symbol graph + dependency map for cross-file reasoning
Semantic search via pgvector (HNSW index)
Prompt caching aligned with Anthropic's 5-min TTL
VSCode + JetBrains plugins
Streaming SSE responses for sub-second time-to-first-token
Per-team prompt templates
Local-first index option (no code leaves your machine)

Architecture

01FastAPI gateway with Pydantic validation
02Tree-sitter parsers for 12 languages (Python, TS, Go, Rust, Java, ...)
03PostgreSQL + pgvector for embeddings (HNSW for fast recall)
04Redis for prompt cache + session state
05Claude API: Sonnet for code, Haiku for routing decisions
06React + TanStack Query frontend
07VSCode extension built on the LSP
08Local indexer in Rust for the local-first option

Stack

PythonFastAPIReactPostgreSQLpgvectorClaude APIRedisTree-sitterServer-Sent Events

From the codebase

Annotated excerpts.

01 · Hybrid retrieval: semantic search + symbol-graph expansion + reranking.

core/retrieval.pypython

async def retrieve_context(
    query: str,
    repo: RepoIndex,
    k: int = 12,
) -> list[CodeChunk]:
    embedding = await embed(query)

    # Vector search across chunks
    semantic = await repo.vector_search(embedding, k=k * 2)

    # Walk symbol graph from top hits
    graph_hits = await repo.expand_symbols(
        seeds=[h.symbol_id for h in semantic[:5]],
        depth=2,
    )

    # Rerank with cross-encoder
    merged = dedupe(semantic + graph_hits)
    reranked = await rerank(query, merged)

    return reranked[:k]

02 · Prompt cache aligned with Anthropic's 5-min TTL.

core/cache.pypython

class PromptCache:
    """Cache aligned with Anthropic's 5-minute TTL."""

    TTL_SECONDS = 5 * 60

    async def get_or_compute(
        self,
        key: PromptKey,
        compute: Callable[[], Awaitable[Response]],
    ) -> CachedResponse:
        cached = await self.redis.get(key.fingerprint)
        if cached and not self._stale(cached):
            return CachedResponse(response=cached, cache_hit=True)

        response = await compute()
        await self.redis.setex(
            key.fingerprint,
            self.TTL_SECONDS,
            response.model_dump_json(),
        )
        return CachedResponse(response=response, cache_hit=False)

What the client said

He integrated Claude into our editor and the cache hit rate is sitting at 78%. Token bill dropped by more than half month-over-month. Excellent communicator — async-friendly, no theatrics, just delivery.

Daniel Ortiz

Product Lead · Stitch.io

Other projects

Continue browsing

Fintech · Treasury

VAULTPAY

→

Programmable treasury for modern fintech.

DevOps · Observability

NIMBUS

→

Observability that respects your time.

Have a project like this in mind? Let's talk.

Send me a brief and I'll respond within 24 hours.

Get in touch Back to home