Pure vector search is giving you half the answer

You asked your RAG system "which customers did the new pricing affect, and which contracts are tied to them?" Your retrieval is vector-only, no hybrid retrieval, no graph traversal. The vector retriever returned 5 chunks about pricing. Great. It did not return anything about the contracts or the customer relationships, because embeddings cannot express "things connected to things." The model wrote a half-answer about pricing and made up the contract part.

This is the wall vector-only RAG hits on relational questions. Embeddings are good at semantic similarity. They are bad at structure. Graphs are the opposite. The fix is not to pick one. The fix is hybrid retrieval: run both, fuse the results, and hand the union to the LLM.

This post is the pattern, the trade-off, the 60 lines of code that fuse vector and graph results, and the rule for when hybrid earns its complexity.

Why does vector search alone miss structured relationships?

Because an embedding compresses a document into a point in vector space based on the words it contains. 2 documents that say similar things end up near each other. 2 documents that are connected but say different things do not.

A pricing doc that says "Enterprise tier: $5k/month" and a contract that says "Customer X agrees to Enterprise tier" are related because of an entity reference, not because of word overlap. Vector search finds the pricing doc when you ask about pricing and the contract when you ask about contracts. It does not find both when you ask a question that spans the relationship.

graph TD
    Q[Which customers did<br/>the new pricing affect?] --> V[Vector search]
    V -->|top 5| VR[Pricing docs only]

    Q --> G[Graph traversal]
    G -->|pricing - tier - customer| GR[Customer nodes]

    VR --> Fuse[Fusion]
    GR --> Fuse
    Fuse --> All[Pricing + affected customers]

    style VR fill:#fee2e2,stroke:#b91c1c
    style GR fill:#fef3c7,stroke:#b45309
    style All fill:#dcfce7,stroke:#15803d

Graph search walks edges. It is designed for "things connected to things." It is terrible at fuzzy text matching. Vector search is designed for fuzzy text matching and terrible at traversal. Hybrid retrieval plays to both strengths.

What does a hybrid retrieval pipeline look like?

4 stages. First, route the query. Second, run vector search and graph traversal in parallel. Third, fuse the results with a scoring function. Fourth, pass the fused set to the generator.

The router is the new piece most teams skip. Not every question needs both retrievers. A simple lookup like "what is our refund policy" is vector-only. A traversal like "who owns this feature" is graph-only. A cross-cutting question like the pricing example needs both. A classifier up front picks the mode and keeps the cost down.

# filename: router.py
# description: A tiny classifier that picks vector-only, graph-only,
# or hybrid based on the question shape.
import json
from anthropic import Anthropic

client = Anthropic()

ROUTE_PROMPT = '''Classify the question into one of: vector, graph, hybrid.
- vector: semantic lookup, definition, explanation, policy question
- graph: who owns, what depends on, which entities are connected
- hybrid: crosses both (comparisons, impact questions, root causes)

Output JSON only: {"route": "..."}.

Question: {q}'''


def route(question: str) -> str:
    reply = client.messages.create(
        model='claude-haiku-4-5-20251001',
        max_tokens=50,
        messages=[{'role': 'user', 'content': ROUTE_PROMPT.format(q=question)}],
    )
    return json.loads(reply.content[0].text)['route']

A cheap model is fine for routing. The decision is short and the signal is in the question shape, not in deep reasoning.

How do you fuse vector and graph results?

Use Reciprocal Rank Fusion (RRF). It is the same algorithm search engines use to combine keyword and semantic results, and it works just as well on graph and vector results. No tuning, no weights, no training. Just 10 lines of code.

# filename: fusion.py
# description: Reciprocal rank fusion for hybrid retrieval results.
# Blends vector and graph hits into one scored list.
def reciprocal_rank_fusion(result_lists: list[list[str]], k: int = 60) -> list[str]:
    scores: dict[str, float] = {}
    for results in result_lists:
        for rank, doc_id in enumerate(results):
            scores[doc_id] = scores.get(doc_id, 0.0) + 1.0 / (k + rank + 1)
    return sorted(scores.keys(), key=lambda d: scores[d], reverse=True)

The math is simple. Each document gets a score of 1 / (k + rank) for every list it appears in. Documents that rank well in both lists get the highest combined score. k = 60 is the default from the original RRF paper and it works in practice without tuning.

The beauty is that you do not need to calibrate vector similarity scores against graph path scores. RRF operates on ranks, not scores, so it is reliable to wildly different scoring scales. That is exactly what you need when fusing a cosine similarity with a path-length traversal.

How do you wire vector and graph retrievers together?

Run them in parallel, fuse the results, then hand to the generator. asyncio.gather is the simplest concurrency primitive for this.

# filename: hybrid_retrieve.py
# description: The top-level hybrid retriever. Runs vector and graph
# searches in parallel, fuses results, and returns the top-k chunks.
import asyncio
from app.vector_store import vector_search  # returns list[doc_id]
from app.graph_store import graph_search  # returns list[doc_id]
from fusion import reciprocal_rank_fusion
from router import route


async def hybrid_retrieve(question: str, k: int = 10) -> list[str]:
    mode = route(question)
    if mode == 'vector':
        return await vector_search(question, k=k)
    if mode == 'graph':
        return await graph_search(question, k=k)

    v, g = await asyncio.gather(
        vector_search(question, k=k),
        graph_search(question, k=k),
    )
    return reciprocal_rank_fusion([v, g])[:k]

3 branches. 2 of them are pure, 1 is the fusion path. Notice that vector_search and graph_search are whatever implementations you already have. RRF treats them as black boxes that return ranked lists of document IDs.

For the full walkthrough of how hybrid retrieval fits into a production agentic RAG stack with planning, rewriting, and self-correction, the Agentic RAG Masterclass covers it module by module. The free RAG Fundamentals primer is the right starting point if you are still building your first pipeline.

When is hybrid retrieval worth the complexity?

Not always. Adding a graph store doubles your infra footprint and adds a second index to keep in sync with the source of truth. The added quality has to earn those costs.

Workload Vector only Hybrid
Pure semantic lookup Wins Overkill
Single-fact FAQ Wins Overkill
Entity-centric questions ("who owns X") Fails Wins
Impact questions ("what does X affect") Fails Wins
Root-cause questions ("why is Y broken") Mediocre Wins
Comparison questions ("X vs Y") Mediocre Wins

The rule: if more than 20 percent of your traffic is in rows 3 through 6, build hybrid. Below that, vector-only is cheaper and nearly as good. The router catches the 5 to 10 percent of questions that genuinely need graph traversal and keeps the rest on the fast path.

If you are shipping a documentation search or a marketing chatbot, you probably do not need this. If you are shipping a product assistant for an engineering team that gets asked "who owns this service" and "what depends on that table" every day, you almost certainly do.

What graph store should you use?

3 reasonable choices, in increasing order of complexity.

Neo4j is the default if you already know Cypher. It has the richest query language and the best tooling. It is also the most operationally heavy.

NetworkX is in-memory Python and zero-infra. It works for under 100k nodes and is perfect for proofs of concept, small internal tools, and anything you can rebuild from scratch on startup.

A graph layer on top of Postgres (recursive CTEs) is the pragmatic choice for most teams. If you already run Postgres, you can model the graph as edge and node tables and query with recursive CTEs. Performance caps out earlier than Neo4j but the operational simplicity is worth a lot.

Pick based on graph size and ops tolerance, not benchmarks. For under 100k nodes, Postgres or NetworkX is fine. Above that, Neo4j earns its keep.

What to do Monday morning

  1. Sample 100 production questions. Label each as vector-only, graph-only, or hybrid. Count the hybrid bucket. Below 20 percent, stop here and stay on vector-only.
  2. If the bucket is above 20 percent, build the smallest possible graph index. Start with NetworkX or Postgres. Do not install Neo4j in week 1.
  3. Add the RRF fusion function from this post. Plug it between your existing retriever and generator. 10 lines, no tuning.
  4. Add the router. Use a cheap model and the 3-branch prompt. Route aggressively to vector-only to keep cost down.
  5. Run the same eval set through vector-only and hybrid. Measure accuracy on the hybrid bucket specifically. Expect a 10 to 20 point lift on those questions, zero change on the rest.

The headline: hybrid retrieval is vector-only plus a graph store plus 10 lines of rank fusion. Most teams do not need it. The ones that do see a big lift on a specific question shape and nothing else.

Frequently asked questions

What is hybrid retrieval in RAG?

Hybrid retrieval combines multiple retrievers (typically vector search and a keyword index or graph traversal) and fuses their results into a single ranked list before passing to the LLM. The goal is to cover question shapes that a single retriever fails on. Vector search is strong on semantic similarity; graph search is strong on structural traversal. Fusing them plays to both strengths.

What is reciprocal rank fusion?

Reciprocal rank fusion (RRF) is an algorithm that combines multiple ranked lists into a single list by scoring each document as 1 / (k + rank) for every list it appears in, then sorting by the sum. It needs no tuning, no weights, and no score calibration between retrievers. The default k = 60 from the original paper works in practice, which makes RRF the simplest fusion algorithm worth shipping.

When should I use graph search instead of vector search in RAG?

When the question is about relationships between entities, not about the semantic content of documents. "Who owns this service," "what depends on this table," "which customers are affected by this change" are all traversal questions that graph search handles better than embeddings. If more than 20 percent of your traffic is this shape, add a graph retriever.

What graph store should I use for hybrid RAG?

Start with Postgres and recursive CTEs if you already run Postgres. Use NetworkX for proofs of concept under 100k nodes. Move to Neo4j only when you hit scale limits or need Cypher's expressiveness. The operational overhead of a dedicated graph database is rarely worth it for the first production version.

Does hybrid retrieval slow down RAG pipelines?

A little. Running vector and graph searches in parallel (via asyncio.gather) adds roughly the latency of the slower of the 2, plus the fusion cost (negligible). In practice the added latency is 50 to 150 milliseconds, which is invisible to a user reading a streamed answer. The router makes sure only hybrid-eligible questions pay this cost.

Key takeaways

  1. Vector search excels at semantic similarity and fails at structural traversal. Graph search is the opposite. Neither alone covers all real questions.
  2. Hybrid retrieval runs both in parallel and fuses the results. The union beats either alone on comparison, impact, and relationship questions.
  3. Reciprocal rank fusion is 10 lines and works without tuning. It operates on ranks, not scores, so it is reliable to wildly different retriever scoring scales.
  4. Add a router up front so cheap lookups stay on vector-only. Routing aggressively keeps the total cost low even with a second retriever available.
  5. Start simple on the graph store: Postgres CTEs or NetworkX. Neo4j only when the scale or query expressiveness demands it.
  6. To see hybrid retrieval wired into a full production agentic RAG stack alongside planning, reranking, and grounding, walk through the Agentic RAG Masterclass, or start with the RAG Fundamentals primer.

For the original reciprocal rank fusion paper that this post's fusion function is based on, see Cormack, Clarke, and Büttcher's 2009 paper on RRF. The algorithm has not been beaten by anything simpler in the decade since.

Share this post

Continue Reading

Weekly Bytes of AI

Technical deep-dives for engineers building production AI systems.

Architecture patterns, system design, cost optimization, and real-world case studies. No fluff, just engineering insights.

Unsubscribe anytime. We respect your inbox.

Ready to go deeper?

Go beyond articles. Build production AI systems with hands-on workshops and our intensive AI Bootcamp.