Combining vector stores in RAG: multi-source retrieval

Your RAG pipeline needs docs from 3 sources and you put them all in one index

Your agent needs to answer questions using internal docs, a public knowledge base, and the user's own uploaded files. You shoved everything into one FAISS index. Now the retriever sometimes returns public docs when the user asks about internal policy, and internal docs when they ask about their own upload. The ranking is a mess because different sources have different writing styles and embedding distributions.

The fix is to keep each source in its own vector store and merge the results at query time. Each store is tuned for its own content. The merge layer handles ranking, deduplication, and source tagging. The agent sees a unified result with clear provenance.

This post is the multi-source vector store pattern: when to split into multiple stores, how to merge results, how to deduplicate near-identical hits, and the source-tagging rule that helps the agent cite correctly.

Why put docs from different sources in the same index?

You should not. 3 specific failure modes of a single combined index:

Cross-source ranking drift. Public docs are more polished and rank higher in cosine similarity. Internal docs get buried even when they are more relevant.
No source filtering. When the user asks "what does MY docs say," the retriever has no way to scope down to just their source.
Embedding distribution mismatch. Content from different sources may have been embedded with different models or chunk sizes, producing non-comparable vector distances.

Keeping each source in its own store lets you tune each one, filter by source at query time, and merge results with explicit ranking logic.

graph LR
    Query[Query] --> Embed[Embedding model]
    Embed --> S1[Store 1: internal docs]
    Embed --> S2[Store 2: public KB]
    Embed --> S3[Store 3: user uploads]

    S1 --> Merge[Merge and rerank]
    S2 --> Merge
    S3 --> Merge

    Merge --> Dedup[Deduplicate]
    Dedup --> Tagged[Tagged results]
    Tagged --> LLM[LLM with source info]

    style Merge fill:#dbeafe,stroke:#1e40af
    style Tagged fill:#dcfce7,stroke:#15803d

How do you merge results from multiple stores?

2 strategies. Start with Reciprocal Rank Fusion (RRF). Move to score-weighted merging only if RRF does not fit your use case.

Reciprocal Rank Fusion (default)

# filename: app/rag/merge.py
# description: Merge results from multiple vector stores using Reciprocal Rank Fusion.
from dataclasses import dataclass
from collections import defaultdict


@dataclass
class RetrievalResult:
    doc_id: str
    content: str
    source: str
    score: float


def reciprocal_rank_fusion(
    result_lists: list[list[RetrievalResult]],
    k: int = 60,
) -> list[RetrievalResult]:
    scores: dict[str, float] = defaultdict(float)
    doc_map: dict[str, RetrievalResult] = {}
    for results in result_lists:
        for rank, r in enumerate(results):
            scores[r.doc_id] += 1.0 / (k + rank + 1)
            if r.doc_id not in doc_map:
                doc_map[r.doc_id] = r
    ranked_ids = sorted(scores.keys(), key=lambda i: scores[i], reverse=True)
    return [doc_map[i] for i in ranked_ids]

RRF is the same algorithm search engines use to combine keyword and semantic results. No tuning, no weights, reliable to score scale differences across stores.

For the deeper hybrid retrieval pattern with vector + graph, see the Hybrid retrieval in RAG post.

How do you deduplicate near-identical results?

Different sources sometimes contain the same document (a public doc was also uploaded by a user, for example). Detect duplicates by content similarity and keep the highest-ranked version.

# filename: app/rag/dedupe.py
# description: Deduplicate near-identical retrieval results by content prefix.
import hashlib


def dedupe(results: list[RetrievalResult], prefix_len: int = 200) -> list[RetrievalResult]:
    seen: set[str] = set()
    out = []
    for r in results:
        key = hashlib.sha256(r.content[:prefix_len].encode()).hexdigest()
        if key in seen:
            continue
        seen.add(key)
        out.append(r)
    return out

Hash the first 200 characters. Exact match is too strict (whitespace differences); full hash is too loose (small edits produce different hashes). Prefix hash catches 95 percent of near-duplicates without over-merging.

How do you run the full multi-source retrieval?

Query each store in parallel, merge with RRF, deduplicate, return top K.

# filename: app/rag/multi_source.py
# description: Multi-source retrieval combining 3 vector stores.
import asyncio
from app.rag.merge import reciprocal_rank_fusion
from app.rag.dedupe import dedupe


async def multi_source_retrieve(
    query: str,
    stores: list,
    k: int = 5,
) -> list[RetrievalResult]:
    query_vec = embedder.encode(query, normalize_embeddings=True)

    # Query each store in parallel
    tasks = [store.search(query_vec, k=k * 2) for store in stores]
    all_results = await asyncio.gather(*tasks)

    # Merge, dedupe, take top k
    merged = reciprocal_rank_fusion(all_results)
    deduped = dedupe(merged)
    return deduped[:k]

Query each store for k * 2 candidates so the merge has room to work. Final result is k after dedup. Parallel queries via asyncio.gather make the combined latency roughly equal to the slowest single store, not the sum.

For the broader RAG architecture this fits into, see the Agentic RAG with LangGraph post.

Why tag each result with its source?

Because the agent needs to cite provenance in the final answer. Without a source tag, the user cannot tell if an answer came from the public KB (generally trusted) or their own upload (specific to them) or the internal docs (authoritative for policy).

Include the source in the prompt to the final LLM:

Context:
[Source: internal-docs] ...first chunk...
[Source: public-kb] ...second chunk...
[Source: user-upload] ...third chunk...

Question: ...
Answer based on the context above, citing the source tag for each claim.

The LLM will then include source attribution in its response, which users find enormously more trustworthy than an ungrounded answer.

When should you NOT use multi-source retrieval?

When all your content is from the same source. 3 specific cases where a single index wins:

Small uniform corpus. Under 10k documents, all internal, all same format. A single FAISS index is simpler and faster.
Same author, same time period. If all docs were written by the same team in the same quarter, embedding distributions are consistent enough for one index.
No source-based filtering needed. If the user never asks "what does MY docs say," there is no reason to split.

Multi-source retrieval earns its complexity when sources differ in style, freshness, or scope, or when users need to filter by source.

For the FAISS setup that each store uses under the hood, see the FAISS vector store RAG post.

What to do Monday morning

Audit your current RAG corpus. If it combines 2+ distinct sources (internal, public, user), split them into separate stores.
Build the multi-source retriever with RRF merging. 40 lines of Python on top of your existing stores.
Add source tagging to every result. Include the tag in the prompt to the final LLM.
Test with a query that should pull from only one source (e.g., "what does my uploaded PDF say"). Confirm the top results come from that source.
Benchmark latency. Parallel queries should keep p95 within 50 ms of the slowest single store.

The headline: multi-source vector stores beat a single combined index when content comes from different sources with different styles or scopes. Split the stores, merge with RRF, dedupe, tag the sources. 40 lines of Python, noticeably better retrieval.

Frequently asked questions

When should I use multiple vector stores instead of one?

When your content comes from different sources with different writing styles, freshness, or scopes. Internal docs + public knowledge base + user uploads is the canonical case. A single index blends them unevenly; separate stores let you tune each one and filter by source at query time.

What is Reciprocal Rank Fusion?

RRF is an algorithm that combines multiple ranked lists into one by scoring each document as 1 / (k + rank) for every list it appears in, then sorting by the sum. It needs no tuning and no score calibration across stores, which makes it reliable to the different score scales different vector stores produce.

How do I deduplicate near-identical results?

Hash the first 200 characters of each result's content. Exact-content matching is too strict (whitespace differences produce different hashes); full-content matching is too loose. A 200-character prefix catches 95 percent of near-duplicates without over-merging.

Why should I tag results with their source?

Because the LLM uses the source tag to cite provenance in its answer, which users find much more trustworthy than an ungrounded response. The tag also lets you filter by source at query time when the user asks a source-specific question.

Is multi-source retrieval slower than single-source?

Not much. Query each store in parallel with asyncio.gather. The combined latency is roughly equal to the slowest single store, not the sum. For 3 stores with sub-5ms latency each, the full multi-source retrieval typically lands under 10 ms end-to-end including merge and dedupe.

Key takeaways

Putting documents from different sources in one index produces ranking drift, filtering limitations, and embedding distribution mismatches.
Keep each source in its own vector store. Query all stores in parallel, merge results with Reciprocal Rank Fusion, deduplicate by content prefix hash.
RRF is the default merge algorithm. No tuning, no score calibration, reliable to different store score scales.
Tag every result with its source. The LLM uses the tag to cite provenance, which users find more trustworthy.
Parallel retrieval keeps latency close to the slowest single store, not the sum. asyncio.gather is the right primitive.
To see multi-source retrieval wired into a full production RAG stack with reranking and evaluation, walk through the Agentic RAG Masterclass, or start with the RAG Fundamentals primer.

For the LangChain documentation on multi-retriever chains and ensemble retrievers, see the LangChain ensemble retriever guide.

Your RAG pipeline needs docs from 3 sources and you put them all in one index

Why put docs from different sources in the same index?

You should not. 3 specific failure modes of a single combined index:

Cross-source ranking drift. Public docs are more polished and rank higher in cosine similarity. Internal docs get buried even when they are more relevant.
No source filtering. When the user asks "what does MY docs say," the retriever has no way to scope down to just their source.
Embedding distribution mismatch. Content from different sources may have been embedded with different models or chunk sizes, producing non-comparable vector distances.

Keeping each source in its own store lets you tune each one, filter by source at query time, and merge results with explicit ranking logic.

graph LR
    Query[Query] --> Embed[Embedding model]
    Embed --> S1[Store 1: internal docs]
    Embed --> S2[Store 2: public KB]
    Embed --> S3[Store 3: user uploads]

    S1 --> Merge[Merge and rerank]
    S2 --> Merge
    S3 --> Merge

    Merge --> Dedup[Deduplicate]
    Dedup --> Tagged[Tagged results]
    Tagged --> LLM[LLM with source info]

    style Merge fill:#dbeafe,stroke:#1e40af
    style Tagged fill:#dcfce7,stroke:#15803d

How do you merge results from multiple stores?

2 strategies. Start with Reciprocal Rank Fusion (RRF). Move to score-weighted merging only if RRF does not fit your use case.

Reciprocal Rank Fusion (default)

# filename: app/rag/merge.py
# description: Merge results from multiple vector stores using Reciprocal Rank Fusion.
from dataclasses import dataclass
from collections import defaultdict


@dataclass
class RetrievalResult:
    doc_id: str
    content: str
    source: str
    score: float


def reciprocal_rank_fusion(
    result_lists: list[list[RetrievalResult]],
    k: int = 60,
) -> list[RetrievalResult]:
    scores: dict[str, float] = defaultdict(float)
    doc_map: dict[str, RetrievalResult] = {}
    for results in result_lists:
        for rank, r in enumerate(results):
            scores[r.doc_id] += 1.0 / (k + rank + 1)
            if r.doc_id not in doc_map:
                doc_map[r.doc_id] = r
    ranked_ids = sorted(scores.keys(), key=lambda i: scores[i], reverse=True)
    return [doc_map[i] for i in ranked_ids]

RRF is the same algorithm search engines use to combine keyword and semantic results. No tuning, no weights, reliable to score scale differences across stores.

For the deeper hybrid retrieval pattern with vector + graph, see the Hybrid retrieval in RAG post.

How do you deduplicate near-identical results?

Different sources sometimes contain the same document (a public doc was also uploaded by a user, for example). Detect duplicates by content similarity and keep the highest-ranked version.

# filename: app/rag/dedupe.py
# description: Deduplicate near-identical retrieval results by content prefix.
import hashlib


def dedupe(results: list[RetrievalResult], prefix_len: int = 200) -> list[RetrievalResult]:
    seen: set[str] = set()
    out = []
    for r in results:
        key = hashlib.sha256(r.content[:prefix_len].encode()).hexdigest()
        if key in seen:
            continue
        seen.add(key)
        out.append(r)
    return out

How do you run the full multi-source retrieval?

Query each store in parallel, merge with RRF, deduplicate, return top K.

# filename: app/rag/multi_source.py
# description: Multi-source retrieval combining 3 vector stores.
import asyncio
from app.rag.merge import reciprocal_rank_fusion
from app.rag.dedupe import dedupe


async def multi_source_retrieve(
    query: str,
    stores: list,
    k: int = 5,
) -> list[RetrievalResult]:
    query_vec = embedder.encode(query, normalize_embeddings=True)

    # Query each store in parallel
    tasks = [store.search(query_vec, k=k * 2) for store in stores]
    all_results = await asyncio.gather(*tasks)

    # Merge, dedupe, take top k
    merged = reciprocal_rank_fusion(all_results)
    deduped = dedupe(merged)
    return deduped[:k]

For the broader RAG architecture this fits into, see the Agentic RAG with LangGraph post.

Why tag each result with its source?

Include the source in the prompt to the final LLM:

Context:
[Source: internal-docs] ...first chunk...
[Source: public-kb] ...second chunk...
[Source: user-upload] ...third chunk...

Question: ...
Answer based on the context above, citing the source tag for each claim.

The LLM will then include source attribution in its response, which users find enormously more trustworthy than an ungrounded answer.

When should you NOT use multi-source retrieval?

When all your content is from the same source. 3 specific cases where a single index wins:

Small uniform corpus. Under 10k documents, all internal, all same format. A single FAISS index is simpler and faster.
Same author, same time period. If all docs were written by the same team in the same quarter, embedding distributions are consistent enough for one index.
No source-based filtering needed. If the user never asks "what does MY docs say," there is no reason to split.

Multi-source retrieval earns its complexity when sources differ in style, freshness, or scope, or when users need to filter by source.

For the FAISS setup that each store uses under the hood, see the FAISS vector store RAG post.

What to do Monday morning

Audit your current RAG corpus. If it combines 2+ distinct sources (internal, public, user), split them into separate stores.
Build the multi-source retriever with RRF merging. 40 lines of Python on top of your existing stores.
Add source tagging to every result. Include the tag in the prompt to the final LLM.
Test with a query that should pull from only one source (e.g., "what does my uploaded PDF say"). Confirm the top results come from that source.
Benchmark latency. Parallel queries should keep p95 within 50 ms of the slowest single store.

Putting documents from different sources in one index produces ranking drift, filtering limitations, and embedding distribution mismatches.
Keep each source in its own vector store. Query all stores in parallel, merge results with Reciprocal Rank Fusion, deduplicate by content prefix hash.
RRF is the default merge algorithm. No tuning, no score calibration, reliable to different store score scales.
Tag every result with its source. The LLM uses the tag to cite provenance, which users find more trustworthy.
Parallel retrieval keeps latency close to the slowest single store, not the sum. asyncio.gather is the right primitive.
To see multi-source retrieval wired into a full production RAG stack with reranking and evaluation, walk through the Agentic RAG Masterclass, or start with the RAG Fundamentals primer.

For the LangChain documentation on multi-retriever chains and ensemble retrievers, see the LangChain ensemble retriever guide.

Combining vector stores in RAG: multi-source retrieval

Share this post

Share this post

Continue Reading

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Retry patterns for LLM API errors in production

Weekly Bytes of AI

Ready to go deeper?

Combining vector stores in RAG: multi-source retrieval

Share this post

Share this post

Continue Reading

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Retry patterns for LLM API errors in production

Weekly Bytes of AI

Ready to go deeper?