Multi-hop RAG: when one retrieval isn't enough
In our last post, we built a resilient agent that can handle tool failures. It's reliable, but it's still "dumb." It can only answer simple, one-step questions.
This post is for you if you've ever built a RAG system and watched it fail this simple query:
"What is the competitor to the product mentioned in our latest press release?"
A simple RAG system will fail this query 100% of the time. It's a two-part question, and the agent can't figure it out.
Today, we'll build a Multi-Hop RAG Agent that can "think" in multiple steps to solve complex research questions.
What's the problem with the one-step retriever?
Let's trace the failure of our simple RAG bot.
User Query: "What is the competitor to the product mentioned in our latest press release?"
sequenceDiagram
participant User
participant Agent
participant VectorDB
User->>Agent: "Competitor to product in press release?"
activate Agent
Agent->>VectorDB: "Find docs about 'competitor' + 'press release'"
activate VectorDB
VectorDB-->>Agent: "Here is the 'Latest Press Release' doc."
deactivate VectorDB
Agent->>Agent: **Reads Doc:** "Our new product, 'Model-V', is launching."
Agent-->>User: [X "The press release mentions 'Model-V', but does not mention any competitors."]
deactivate Agent
Why this is bad:
- The agent found the right first document (the press release).
- But it stopped. It didn't "understand" that the query was a 2-step process:
- Hop 1: Find the press release to identify the product (Answer: "Model-V").
- Hop 2: Start a new search for "competitors of Model-V."
What is a query decomposition graph?
We need our agent to stop trying to answer in one shot. We need it to Decompose the query.
We'll build an agent (using LangGraph) that can use an LLM to generate new questions for itself.
graph TD
A["User Query: 'Competitor to product in press release?'"] --> B(Hop 1: RAG)
B -- "Context: '...our new product, Model-V...'" --> C(LLM: Decompose & Generate)
C -- "Sub-Query: 'Who is the competitor to Model-V?'" --> D(Hop 2: RAG / Web Search)
D -- "Context: 'Cognito Inc. is the main competitor...'" --> C
C --> E["Final Answer: 'The press release mentions 'Model-V'. Its main competitor is Cognito Inc.'"]
style B fill:#e3f2fd,stroke:#0d47a1
style C fill:#e8f5e9,stroke:#388e3c
style D fill:#e3f2fd,stroke:#0d47a1
style E fill:#e8f5e9,stroke:#388e3c
This is a Multi-Hop agent. It uses the output of one retrieval step as the input for the next retrieval step.
How do you build a multi-hop graph?
We'll define a LangGraph GraphState that can hold a list of sub-questions and expand itself.
Brick 1: the "memory" (graphstate)
Our "memory" needs to be a list of questions to solve, and a list of facts we've found.
# filename: example.py
# description: Code example from the post.
from typing import TypedDict, List
from langgraph.graph import StateGraph, END
class GraphState(TypedDict):
original_query: str
questions: List[str] # The "to-do" list of questions
answers: List[str] # The "done" list of facts
Brick 2: the "planner" node
This is our new "brain." Its job is to take the user's query and decompose it into a step-by-step plan.
from openai import OpenAI
import json
llm_client = OpenAI()
# --- Node 1: The Planner (Query Decomposer) ---
def plan_queries(state):
print("---NODE: PLAN_QUERIES---")
prompt = f"""You are a research assistant.
Break down the following complex question into a series of
simple, searchable sub-questions.
Question: {state['original_query']}
Return a JSON list of strings.
Example: ["question 1", "question 2"]
"""
response = llm_client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
response_format={"type": "json_object"}
)
sub_questions = json.loads(response.choices[0].message.content)
return {"questions": sub_questions, "answers": []}
Observation: If we send "What is the competitor to the product in the latest press release?", this node will output a list like:
["What is the latest press release?", "Who is the competitor to the product mentioned?"]
Brick 3: the "executor" node (our RAG tool)
This node's job is to take one question from the "to-do" list, run our simple RAG pipeline on it, and add the fact to our "done" list.
# This is our simple RAG function from Post 1
def simple_rag_pipeline(query: str) -> str:
# (Code to retrieve from vector store and generate an answer)
return "The latest press release mentions 'Model-V'." # Mocked answer
# --- Node 2: The Executor (Runs one sub-question) ---
def execute_search(state):
print("---NODE: EXECUTE_SEARCH---")
# Get the *first* question from the "to-do" list
question = state["questions"][0]
# "Pop" it off the to-do list
remaining_questions = state["questions"][1:]
# Run RAG on just that one question
answer = simple_rag_pipeline(question)
# Add the answer to our "done" list
new_answers = state["answers"] + [answer]
return {"questions": remaining_questions, "answers": new_answers}
Brick 4: the "wires" (the loop)
Now we wire it all together.
# --- The "Decision" Edge ---
def check_for_more_questions(state):
# This is our loop condition
if len(state["questions"]) > 0:
return "continue" # Go back to the executor
else:
return "end" # All done, go to the final answer
# --- Build the Graph ---
workflow = StateGraph(GraphState)
workflow.add_node("plan_queries", plan_queries)
workflow.add_node("execute_search", execute_search)
workflow.add_node("generate_final_answer", ...) # The final LLM call
workflow.set_entry_point("plan_queries")
workflow.add_edge("plan_queries", "execute_search")
# This is our Multi-Hop Loop!
workflow.add_conditional_edges(
"execute_search",
check_for_more_questions,
{
"continue": "execute_search", # Loop back to itself!
"end": "generate_final_answer"
}
)
workflow.add_edge("generate_final_answer", END)
app = workflow.compile()
Result: We've built an agent that can reason. It will "plan" its work, then "execute" that plan one step at a time, looping over the execute_search node until its "to-do" list is empty.
Challenge for you
- Use Case: Our
execute_searchnode is simple. It always uses oursimple_rag_pipeline. - The Problem: What if a sub-question is "What is the CEO's name?" (internal data), but another is "What is our competitor's stock price?" (external data)?
- Your Task: How would you combine this post with our previous post? How could you add a Router inside the
execute_searchnode to decide which tool (internal RAG vs. web search) to use for each sub-question?
Frequently asked questions
When should I implement multi-hop RAG instead of simple retrieval?
Implement multi-hop when simple RAG fails on compound queries that require multiple retrieval steps - like finding a product in a press release then searching for its competitors. Single-hop retrievers can't chain these together. If you're seeing consistent failures on multi-part questions in production, multi-hop's added complexity pays off against the cost of wrong answers.
How do I decompose queries into sub-questions for multi-hop agents?
Have the LLM generate sub-questions from the original query, store them in a to-do list, then execute each sequentially. LangGraph manages state between decomposition and retrieval phases. This requires no training data - the LLM is naturally good at breaking down multi-step problems. Loop through the to-do list until empty, collecting answers as you go.
How do I route different sub-questions to different retrieval tools?
Add a router node that decides which tool fits each sub-question - internal knowledge base for company data, web search for external research. Without routing, you waste latency and cost. The post's challenge walks through injecting routing into your execution loop, combined with error handling so each hop can fail and retry with alternatives.
For the full reference, see the LangGraph documentation.
Key takeaways
- Multi-hop queries require decomposition: Complex questions need to be broken into simpler sub-questions that can be answered sequentially
- State management enables iteration: Using a list of questions and answers in GraphState allows the agent to track progress through multiple retrieval steps
- The planner generates the roadmap: An LLM-based planner node decomposes complex queries into a series of searchable sub-questions
- The executor runs one step at a time: Each iteration of the executor answers one sub-question and updates the state
- Loops enable multi-step reasoning: Conditional edges that loop back to the executor create the multi-hop retrieval pattern
For more on advanced RAG patterns, see our advanced RAG guide and our agent framework comparison.
For more on building production AI systems, check out our AI Bootcamp for Software Engineers.
Take the next step
- RAG Fundamentals Workshop, Build production RAG pipelines from scratch
- Agentic RAG & Text-to-SQL Workshop, Advanced multi-hop retrieval with LangGraph
Continue Reading
Ready to go deeper?
Go beyond articles. Build production AI systems with hands-on workshops and our intensive AI Bootcamp.