Multi-hop RAG: when one retrieval isn't enough

In our last post, we built a resilient agent that can handle tool failures. It's reliable, but it's still "dumb." It can only answer simple, one-step questions.

This post is for you if you've ever built a RAG system and watched it fail this simple query:

"What is the competitor to the product mentioned in our latest press release?"

A simple RAG system will fail this query 100% of the time. It's a two-part question, and the agent can't figure it out.

Today, we'll build a Multi-Hop RAG Agent that can "think" in multiple steps to solve complex research questions.

What's the problem with the one-step retriever?

Let's trace the failure of our simple RAG bot.

User Query: "What is the competitor to the product mentioned in our latest press release?"

sequenceDiagram
    participant User
    participant Agent
    participant VectorDB
    
    User->>Agent: "Competitor to product in press release?"
    activate Agent
    
    Agent->>VectorDB: "Find docs about 'competitor' + 'press release'"
    activate VectorDB
    VectorDB-->>Agent: "Here is the 'Latest Press Release' doc."
    deactivate VectorDB
    
    Agent->>Agent: **Reads Doc:** "Our new product, 'Model-V', is launching."
    Agent-->>User: [X "The press release mentions 'Model-V', but does not mention any competitors."]
    deactivate Agent

Why this is bad:

The agent found the right first document (the press release).
But it stopped. It didn't "understand" that the query was a 2-step process:
1. Hop 1: Find the press release to identify the product (Answer: "Model-V").
2. Hop 2: Start a new search for "competitors of Model-V."

What is a query decomposition graph?

We need our agent to stop trying to answer in one shot. We need it to Decompose the query.

We'll build an agent (using LangGraph) that can use an LLM to generate new questions for itself.

graph TD
    A["User Query: 'Competitor to product in press release?'"] --> B(Hop 1: RAG)
    B -- "Context: '...our new product, Model-V...'" --> C(LLM: Decompose & Generate)
    C -- "Sub-Query: 'Who is the competitor to Model-V?'" --> D(Hop 2: RAG / Web Search)
    D -- "Context: 'Cognito Inc. is the main competitor...'" --> C
    C --> E["Final Answer: 'The press release mentions 'Model-V'. Its main competitor is Cognito Inc.'"]
    
    style B fill:#e3f2fd,stroke:#0d47a1
    style C fill:#e8f5e9,stroke:#388e3c
    style D fill:#e3f2fd,stroke:#0d47a1
    style E fill:#e8f5e9,stroke:#388e3c

This is a Multi-Hop agent. It uses the output of one retrieval step as the input for the next retrieval step.

How do you build a multi-hop graph?

We'll define a LangGraph GraphState that can hold a list of sub-questions and expand itself.

Brick 1: the "memory" (`graphstate`)

Our "memory" needs to be a list of questions to solve, and a list of facts we've found.

# filename: example.py
# description: Code example from the post.
from typing import TypedDict, List
from langgraph.graph import StateGraph, END

class GraphState(TypedDict):
    original_query: str
    questions: List[str]  # The "to-do" list of questions
    answers: List[str]    # The "done" list of facts

Brick 2: the "planner" node

This is our new "brain." Its job is to take the user's query and decompose it into a step-by-step plan.

from openai import OpenAI
import json

llm_client = OpenAI()

# --- Node 1: The Planner (Query Decomposer) ---
def plan_queries(state):
    print("---NODE: PLAN_QUERIES---")
    
    prompt = f"""You are a research assistant.
    Break down the following complex question into a series of 
    simple, searchable sub-questions.
    
    Question: {state['original_query']}
    
    Return a JSON list of strings.
    Example: ["question 1", "question 2"]
    """
    
    response = llm_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        response_format={"type": "json_object"}
    )
    
    sub_questions = json.loads(response.choices[0].message.content)
    
    return {"questions": sub_questions, "answers": []}

Observation: If we send "What is the competitor to the product in the latest press release?", this node will output a list like:

["What is the latest press release?", "Who is the competitor to the product mentioned?"]

Brick 3: the "executor" node (our RAG tool)

This node's job is to take one question from the "to-do" list, run our simple RAG pipeline on it, and add the fact to our "done" list.

# This is our simple RAG function from Post 1
def simple_rag_pipeline(query: str) -> str:
    # (Code to retrieve from vector store and generate an answer)
    return "The latest press release mentions 'Model-V'." # Mocked answer

# --- Node 2: The Executor (Runs one sub-question) ---
def execute_search(state):
    print("---NODE: EXECUTE_SEARCH---")
    # Get the *first* question from the "to-do" list
    question = state["questions"][0]
    
    # "Pop" it off the to-do list
    remaining_questions = state["questions"][1:]
    
    # Run RAG on just that one question
    answer = simple_rag_pipeline(question)
    
    # Add the answer to our "done" list
    new_answers = state["answers"] + [answer]
    
    return {"questions": remaining_questions, "answers": new_answers}

Brick 4: the "wires" (the loop)

Now we wire it all together.

# --- The "Decision" Edge ---
def check_for_more_questions(state):
    # This is our loop condition
    if len(state["questions"]) > 0:
        return "continue" # Go back to the executor
    else:
        return "end" # All done, go to the final answer

# --- Build the Graph ---
workflow = StateGraph(GraphState)
workflow.add_node("plan_queries", plan_queries)
workflow.add_node("execute_search", execute_search)
workflow.add_node("generate_final_answer", ...) # The final LLM call
workflow.set_entry_point("plan_queries")
workflow.add_edge("plan_queries", "execute_search")

# This is our Multi-Hop Loop!
workflow.add_conditional_edges(
    "execute_search",
    check_for_more_questions,
    {
        "continue": "execute_search", # Loop back to itself!
        "end": "generate_final_answer"
    }
)

workflow.add_edge("generate_final_answer", END)
app = workflow.compile()

Result: We've built an agent that can reason. It will "plan" its work, then "execute" that plan one step at a time, looping over the execute_search node until its "to-do" list is empty.

Challenge for you

Use Case: Our execute_search node is simple. It always uses our simple_rag_pipeline.
The Problem: What if a sub-question is "What is the CEO's name?" (internal data), but another is "What is our competitor's stock price?" (external data)?
Your Task: How would you combine this post with our previous post? How could you add a Router inside the execute_search node to decide which tool (internal RAG vs. web search) to use for each sub-question?

Implement multi-hop when simple RAG fails on compound queries that require multiple retrieval steps - like finding a product in a press release then searching for its competitors. Single-hop retrievers can't chain these together. If you're seeing consistent failures on multi-part questions in production, multi-hop's added complexity pays off against the cost of wrong answers.

How do I decompose queries into sub-questions for multi-hop agents?

Have the LLM generate sub-questions from the original query, store them in a to-do list, then execute each sequentially. LangGraph manages state between decomposition and retrieval phases. This requires no training data - the LLM is naturally good at breaking down multi-step problems. Loop through the to-do list until empty, collecting answers as you go.

How do I route different sub-questions to different retrieval tools?

Add a router node that decides which tool fits each sub-question - internal knowledge base for company data, web search for external research. Without routing, you waste latency and cost. The post's challenge walks through injecting routing into your execution loop, combined with error handling so each hop can fail and retry with alternatives.

For the full reference, see the LangGraph documentation.

Key takeaways

Multi-hop queries require decomposition: Complex questions need to be broken into simpler sub-questions that can be answered sequentially
State management enables iteration: Using a list of questions and answers in GraphState allows the agent to track progress through multiple retrieval steps
The planner generates the roadmap: An LLM-based planner node decomposes complex queries into a series of searchable sub-questions
The executor runs one step at a time: Each iteration of the executor answers one sub-question and updates the state
Loops enable multi-step reasoning: Conditional edges that loop back to the executor create the multi-hop retrieval pattern

For more on advanced RAG patterns, see our advanced RAG guide and our agent framework comparison.

For more on building production AI systems, check out our AI Bootcamp for Software Engineers.

Take the next step

RAG Fundamentals Workshop, Build production RAG pipelines from scratch
Agentic RAG & Text-to-SQL Workshop, Advanced multi-hop retrieval with LangGraph

In our last post, we built a resilient agent that can handle tool failures. It's reliable, but it's still "dumb." It can only answer simple, one-step questions.

This post is for you if you've ever built a RAG system and watched it fail this simple query:

"What is the competitor to the product mentioned in our latest press release?"

A simple RAG system will fail this query 100% of the time. It's a two-part question, and the agent can't figure it out.

Today, we'll build a Multi-Hop RAG Agent that can "think" in multiple steps to solve complex research questions.

What's the problem with the one-step retriever?

Let's trace the failure of our simple RAG bot.

User Query: "What is the competitor to the product mentioned in our latest press release?"

sequenceDiagram
    participant User
    participant Agent
    participant VectorDB
    
    User->>Agent: "Competitor to product in press release?"
    activate Agent
    
    Agent->>VectorDB: "Find docs about 'competitor' + 'press release'"
    activate VectorDB
    VectorDB-->>Agent: "Here is the 'Latest Press Release' doc."
    deactivate VectorDB
    
    Agent->>Agent: **Reads Doc:** "Our new product, 'Model-V', is launching."
    Agent-->>User: [X "The press release mentions 'Model-V', but does not mention any competitors."]
    deactivate Agent

Why this is bad:

The agent found the right first document (the press release).
But it stopped. It didn't "understand" that the query was a 2-step process:
1. Hop 1: Find the press release to identify the product (Answer: "Model-V").
2. Hop 2: Start a new search for "competitors of Model-V."

What is a query decomposition graph?

We need our agent to stop trying to answer in one shot. We need it to Decompose the query.

We'll build an agent (using LangGraph) that can use an LLM to generate new questions for itself.

graph TD
    A["User Query: 'Competitor to product in press release?'"] --> B(Hop 1: RAG)
    B -- "Context: '...our new product, Model-V...'" --> C(LLM: Decompose & Generate)
    C -- "Sub-Query: 'Who is the competitor to Model-V?'" --> D(Hop 2: RAG / Web Search)
    D -- "Context: 'Cognito Inc. is the main competitor...'" --> C
    C --> E["Final Answer: 'The press release mentions 'Model-V'. Its main competitor is Cognito Inc.'"]
    
    style B fill:#e3f2fd,stroke:#0d47a1
    style C fill:#e8f5e9,stroke:#388e3c
    style D fill:#e3f2fd,stroke:#0d47a1
    style E fill:#e8f5e9,stroke:#388e3c

This is a Multi-Hop agent. It uses the output of one retrieval step as the input for the next retrieval step.

How do you build a multi-hop graph?

We'll define a LangGraph GraphState that can hold a list of sub-questions and expand itself.

Brick 1: the "memory" (`graphstate`)

Our "memory" needs to be a list of questions to solve, and a list of facts we've found.

# filename: example.py
# description: Code example from the post.
from typing import TypedDict, List
from langgraph.graph import StateGraph, END

class GraphState(TypedDict):
    original_query: str
    questions: List[str]  # The "to-do" list of questions
    answers: List[str]    # The "done" list of facts

Brick 2: the "planner" node

This is our new "brain." Its job is to take the user's query and decompose it into a step-by-step plan.

from openai import OpenAI
import json

llm_client = OpenAI()

# --- Node 1: The Planner (Query Decomposer) ---
def plan_queries(state):
    print("---NODE: PLAN_QUERIES---")
    
    prompt = f"""You are a research assistant.
    Break down the following complex question into a series of 
    simple, searchable sub-questions.
    
    Question: {state['original_query']}
    
    Return a JSON list of strings.
    Example: ["question 1", "question 2"]
    """
    
    response = llm_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        response_format={"type": "json_object"}
    )
    
    sub_questions = json.loads(response.choices[0].message.content)
    
    return {"questions": sub_questions, "answers": []}

Observation: If we send "What is the competitor to the product in the latest press release?", this node will output a list like:

["What is the latest press release?", "Who is the competitor to the product mentioned?"]

Brick 3: the "executor" node (our RAG tool)

This node's job is to take one question from the "to-do" list, run our simple RAG pipeline on it, and add the fact to our "done" list.

# This is our simple RAG function from Post 1
def simple_rag_pipeline(query: str) -> str:
    # (Code to retrieve from vector store and generate an answer)
    return "The latest press release mentions 'Model-V'." # Mocked answer

# --- Node 2: The Executor (Runs one sub-question) ---
def execute_search(state):
    print("---NODE: EXECUTE_SEARCH---")
    # Get the *first* question from the "to-do" list
    question = state["questions"][0]
    
    # "Pop" it off the to-do list
    remaining_questions = state["questions"][1:]
    
    # Run RAG on just that one question
    answer = simple_rag_pipeline(question)
    
    # Add the answer to our "done" list
    new_answers = state["answers"] + [answer]
    
    return {"questions": remaining_questions, "answers": new_answers}

Brick 4: the "wires" (the loop)

Now we wire it all together.

# --- The "Decision" Edge ---
def check_for_more_questions(state):
    # This is our loop condition
    if len(state["questions"]) > 0:
        return "continue" # Go back to the executor
    else:
        return "end" # All done, go to the final answer

# --- Build the Graph ---
workflow = StateGraph(GraphState)
workflow.add_node("plan_queries", plan_queries)
workflow.add_node("execute_search", execute_search)
workflow.add_node("generate_final_answer", ...) # The final LLM call
workflow.set_entry_point("plan_queries")
workflow.add_edge("plan_queries", "execute_search")

# This is our Multi-Hop Loop!
workflow.add_conditional_edges(
    "execute_search",
    check_for_more_questions,
    {
        "continue": "execute_search", # Loop back to itself!
        "end": "generate_final_answer"
    }
)

workflow.add_edge("generate_final_answer", END)
app = workflow.compile()

Result: We've built an agent that can reason. It will "plan" its work, then "execute" that plan one step at a time, looping over the execute_search node until its "to-do" list is empty.

Challenge for you

Use Case: Our execute_search node is simple. It always uses our simple_rag_pipeline.
The Problem: What if a sub-question is "What is the CEO's name?" (internal data), but another is "What is our competitor's stock price?" (external data)?
Your Task: How would you combine this post with our previous post? How could you add a Router inside the execute_search node to decide which tool (internal RAG vs. web search) to use for each sub-question?

Multi-hop queries require decomposition: Complex questions need to be broken into simpler sub-questions that can be answered sequentially
State management enables iteration: Using a list of questions and answers in GraphState allows the agent to track progress through multiple retrieval steps
The planner generates the roadmap: An LLM-based planner node decomposes complex queries into a series of searchable sub-questions
The executor runs one step at a time: Each iteration of the executor answers one sub-question and updates the state
Loops enable multi-step reasoning: Conditional edges that loop back to the executor create the multi-hop retrieval pattern

For more on advanced RAG patterns, see our advanced RAG guide and our agent framework comparison.

For more on building production AI systems, check out our AI Bootcamp for Software Engineers.

Take the next step

RAG Fundamentals Workshop, Build production RAG pipelines from scratch
Agentic RAG & Text-to-SQL Workshop, Advanced multi-hop retrieval with LangGraph

Multi-hop RAG: when one retrieval isn't enough

Share this post

Share this post

Continue Reading

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Retry patterns for LLM API errors in production

Weekly Bytes of AI

Ready to go deeper?

Multi-hop RAG: when one retrieval isn't enough

Share this post

Share this post

Continue Reading

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Retry patterns for LLM API errors in production

Weekly Bytes of AI

Ready to go deeper?