Your RAG pipeline can't reason because you built a straight line

Your first RAG pipeline is a straight line, not agentic, not planning anything: embed the question, pull 5 chunks, stuff them into a prompt, hit the LLM, return the answer. It works for easy questions. Then somebody asks "what did we ship last quarter and how does it compare to what the competitor shipped?" and the pipeline falls over. It retrieved chunks about your product. It did not think to run a second retrieval for the competitor. It did not rewrite the query. It did not decide to call a different tool. It just did what it always does.

The fix is to stop modeling RAG as a function and start modeling it as a graph. Nodes are steps. Edges are decisions. The LLM picks the next edge based on state. This is what agentic RAG means, and LangGraph is the cleanest way to build it.

This post is the pattern, the state, the nodes, and the 60 lines of Python that turn your linear RAG into something that can plan, rewrite, and call tools. By the end you will be able to draw the graph, write the nodes, and decide which RAG workloads are worth the upgrade.

Why does linear RAG fail on real questions?

Because a real question is not a single retrieval problem. It is a sequence of retrieval decisions, and a linear pipeline cannot make decisions. It can only execute steps.

graph TD
    Q[What did we ship last quarter<br/>vs what the competitor shipped?] --> Linear[Linear RAG]
    Linear -->|one query| R1[Retrieve 5 chunks about us]
    R1 --> Bad[Half-answer: only our side]

    Q --> Agentic[Agentic RAG graph]
    Agentic --> Plan[Planner: split into 2 subtasks]
    Plan -->|us| RA[Retrieve our ships]
    Plan -->|them| RB[Retrieve competitor]
    RA --> Merge[Synthesize]
    RB --> Merge
    Merge --> Good[Complete answer]

    style Bad fill:#fee2e2,stroke:#b91c1c
    style Good fill:#dcfce7,stroke:#15803d

3 decisions a linear pipeline cannot make:

  1. Should I rewrite the query before retrieving? Vague user questions need rewording. Linear RAG never reworks the input.
  2. Should I retrieve from 2 different sources? A competitor comparison needs 2 retrievals against different indexes. Linear RAG has one retriever call.
  3. Should I call a tool instead of retrieving? "What is the current exchange rate" needs an API, not a vector store. Linear RAG has no concept of tools.

Agentic RAG models the whole pipeline as a graph where the LLM can take any of those branches based on what it sees. LangGraph is the library that makes that graph easy to write.

What is LangGraph and how does it model agentic RAG?

LangGraph is a small library on top of LangChain that lets you define a pipeline as a state graph. You declare a state schema (a TypedDict), a set of nodes (Python functions), and edges between them. A node reads from state, does work, and writes updates back. Edges can be conditional, routed by a function that looks at state.

The mental model: imagine the state as a dict that flows through the graph. Every node adds or updates keys. Conditional edges decide which node to visit next.

For agentic RAG, the state typically holds the question, any rewritten version, the retrieved chunks, the tool results, and the current draft answer. The nodes are planner, retriever, rewriter, tool caller, and generator. The edges route based on what the planner decides.

graph LR
    Start([question]) --> Plan[plan node]
    Plan -->|retrieve| Retrieve[retrieve node]
    Plan -->|rewrite| Rewrite[rewrite node]
    Plan -->|tool| Tool[tool node]
    Rewrite --> Retrieve
    Retrieve --> Grade[grade node]
    Tool --> Grade
    Grade -->|good| Generate[generate node]
    Grade -->|bad| Plan
    Generate --> End([answer])

    style Plan fill:#dbeafe,stroke:#1e40af
    style Grade fill:#fef3c7,stroke:#b45309
    style Generate fill:#dcfce7,stroke:#15803d

The graph has a loop. That is the whole point. If the grader says the retrieved context is weak, the plan node runs again with that signal in state and picks a different branch.

How do you build the state and nodes?

Start with the state. Keep it flat. Every field is something a node needs to read or write.

# filename: state.py
# description: The shared state that flows through the agentic RAG graph.
from typing import TypedDict, Literal

class RagState(TypedDict):
    question: str
    rewritten: str
    retrieved: list[str]
    tool_output: str
    next_step: Literal['retrieve', 'rewrite', 'tool', 'generate']
    answer: str
    attempts: int

Then the nodes. Each node is a plain function that takes state and returns a partial update. LangGraph merges the update into state automatically.

# filename: nodes.py
# description: The 5 nodes that make up the agentic RAG graph.
import json
from anthropic import Anthropic
from app.retriever import retrieve
from app.tools import run_tool

client = Anthropic()

PLAN_PROMPT = '''Look at the question and the current state. Decide the next
step: retrieve (search the vector store), rewrite (improve a vague query),
tool (call an external API), or generate (enough context to answer).
Output JSON only: {"next_step": "..."}.

Question: {question}
Retrieved so far: {retrieved}
Tool output: {tool_output}'''


def plan_node(state: RagState) -> dict:
    prompt = PLAN_PROMPT.format(
        question=state['question'],
        retrieved=state['retrieved'][-3:],
        tool_output=state['tool_output'],
    )
    reply = client.messages.create(
        model='claude-sonnet-4-6',
        max_tokens=200,
        messages=[{'role': 'user', 'content': prompt}],
    )
    decision = json.loads(reply.content[0].text)
    return {'next_step': decision['next_step'], 'attempts': state['attempts'] + 1}


def retrieve_node(state: RagState) -> dict:
    query = state['rewritten'] or state['question']
    chunks = retrieve(query)
    return {'retrieved': state['retrieved'] + chunks}


def rewrite_node(state: RagState) -> dict:
    prompt = f'Rewrite this question to be more specific: {state["question"]}'
    reply = client.messages.create(
        model='claude-haiku-4-5-20251001',
        max_tokens=200,
        messages=[{'role': 'user', 'content': prompt}],
    )
    return {'rewritten': reply.content[0].text.strip()}


def generate_node(state: RagState) -> dict:
    context = '\n'.join(state['retrieved']) + '\n' + state['tool_output']
    prompt = f'Answer using only this context:\n{context}\n\nQ: {state["question"]}'
    reply = client.messages.create(
        model='claude-sonnet-4-6',
        max_tokens=1024,
        messages=[{'role': 'user', 'content': prompt}],
    )
    return {'answer': reply.content[0].text}

5 nodes, 4 of them short. The plan node is the brain. Every other node is a dumb executor that does one thing and updates state.

How do you wire the nodes into a LangGraph?

Use StateGraph, add nodes, add conditional edges that route on state['next_step'], and compile. The compiled graph is callable like a function.

# filename: graph.py
# description: Wire the nodes into an agentic RAG state graph.
from langgraph.graph import StateGraph, END
from state import RagState
from nodes import plan_node, retrieve_node, rewrite_node, generate_node

MAX_ATTEMPTS = 4

def route(state: RagState) -> str:
    if state['attempts'] > MAX_ATTEMPTS:
        return 'generate'
    return state['next_step']

builder = StateGraph(RagState)
builder.add_node('plan', plan_node)
builder.add_node('retrieve', retrieve_node)
builder.add_node('rewrite', rewrite_node)
builder.add_node('generate', generate_node)

builder.set_entry_point('plan')
builder.add_conditional_edges('plan', route, {
    'retrieve': 'retrieve',
    'rewrite': 'rewrite',
    'generate': 'generate',
})
builder.add_edge('retrieve', 'plan')
builder.add_edge('rewrite', 'retrieve')
builder.add_edge('generate', END)

graph = builder.compile()

Read that bottom half carefully. retrieve loops back to plan so the model can decide whether to retrieve again, rewrite, or generate. rewrite goes straight to retrieve because rewriting without retrieving is pointless. generate is terminal. The MAX_ATTEMPTS rail in route prevents infinite loops.

To run it, call graph.invoke({'question': '...', 'retrieved': [], 'tool_output': '', 'rewritten': '', 'attempts': 0, 'next_step': 'retrieve', 'answer': ''}). You get back the final state with the answer.

This pattern is the backbone of the Agentic RAG Masterclass, which walks through adding grading nodes, self-correction, and reranking on top of the base graph. If you are still building your first retrieval pipeline, the free RAG Fundamentals primer is the faster on-ramp.

How does planning differ from query rewriting in agentic RAG?

Planning is "what should I do next." Rewriting is "how should I phrase this query."

A planner decides between retrieve, rewrite, call a tool, or generate. It looks at the full state and picks a branch. It is a routing decision.

A rewriter takes a vague or ambiguous question and produces a better search query. "How does this work" becomes "how does the auth middleware validate session tokens in v4." It is a transformation of a single string.

You need both. Planning without rewriting means you retrieve garbage on vague questions. Rewriting without planning means you cannot handle multi-hop or comparison questions. The planner decides when to invoke the rewriter. That layering is what makes agentic RAG work on real workloads.

For a deeper look at the re-planning side specifically, see the Dynamic RAG: Re-Planning Retrieval Strategies Mid-Pipeline post. That one focuses on the iteration loop itself.

When should you add tool use to agentic RAG?

When the question cannot be answered from the corpus alone. Static knowledge bases answer static questions. Any question that needs live data, fresh numbers, or computation needs a tool.

3 concrete cases where tool use earns its added complexity:

  1. Current state questions. "What is the latest commit on main" or "what is the current exchange rate" cannot be precomputed. You need an API.
  2. Computed answers. "How much did we spend on LLM calls last week" needs a SQL query against a usage table, not a vector search.
  3. Structured lookups. "What is the status of ticket PROJ-1234" is a database read by key, not a fuzzy search.

In all 3 cases the tool is just another node in the graph. The planner decides when to route there. The tool node executes the API or SQL call and writes results back to state. The generate node uses that alongside any retrieved context.

When is agentic RAG worth the added cost?

Not always. The honest trade-off:

Workload Linear RAG Agentic RAG
High-volume FAQ Wins Too expensive
Single-fact lookup Wins Overkill
Comparison, multi-hop Mediocre Wins
Vague user questions Fails Wins
Questions needing live data Fails Wins

The rule I use: if more than 20 percent of traffic is in the bottom 3 rows, build agentic RAG. Below that, route hard questions through an agentic graph and keep easy questions on a linear pipeline. A front-line classifier picks which path each question takes. This is how most production RAG systems ship both modes side by side.

For the full production picture of where agentic RAG fits alongside streaming, persistence, and observability, see the System Design: Building a Production-Ready AI Chatbot post.

What to do Monday morning

  1. Sample 50 questions from your eval set. Label each as simple, comparison, multi-hop, vague, or needs-live-data. Count the non-simple bucket.
  2. If the non-simple bucket is above 20 percent, build the 5-node graph from this post. Start with just plan, retrieve, and generate. Add rewrite and tool nodes once the baseline runs.
  3. Add the MAX_ATTEMPTS rail before your second test run. Every agentic RAG without it will eventually loop on a question the corpus cannot answer.
  4. Run both pipelines (linear and agentic) against the same eval set. Compare accuracy on the non-simple bucket specifically. Expect a lift of 15 to 25 points.
  5. Add a classifier in front to route simple questions to the linear pipeline and hard questions to the graph. This is the cheapest way to ship the technique without burning budget on every request.

The headline: agentic RAG is linear RAG plus a state graph and 5 nodes. LangGraph writes the loop for you. The work is in the planner prompt and the node functions, not the wiring.

Frequently asked questions

What is agentic RAG?

Agentic RAG is a retrieval pattern where an LLM-driven planner decides which retrieval, rewriting, or tool call to run next, based on the current state of the pipeline. Instead of a single retrieval pass, the pipeline is a graph with conditional edges that loop until the evidence is enough to answer. It handles comparison questions, multi-hop reasoning, and live-data lookups that linear RAG fails on.

How does LangGraph help build agentic RAG?

LangGraph provides a StateGraph abstraction that lets you declare nodes, edges, and a shared state schema. You write each step as a function that reads and updates state. Conditional edges route between nodes based on a routing function you supply. This replaces the manual while-loop of most agent code and gives you a diagrammable, testable pipeline out of the box.

What is the difference between query rewriting and planning in agentic RAG?

Planning decides the next action (retrieve, rewrite, call a tool, generate). Rewriting transforms a single query string into a better search query. Planning is routing; rewriting is string manipulation. You need both because planning without rewriting fails on vague questions and rewriting without planning fails on multi-hop questions. The planner invokes the rewriter when it decides the current query is too vague to retrieve well.

When should I use tool calls inside a RAG pipeline?

Whenever the question cannot be answered from the static corpus. Current state ("latest commit"), computed values ("usage last week"), and keyed lookups ("ticket PROJ-1234") all need a tool call, not a vector search. Agentic RAG lets the planner route to a tool node when it detects those question shapes in state. The tool result is merged into the context before the generate node runs.

How do I prevent an agentic RAG loop from running forever?

3 guardrails. A hard iteration ceiling (3 to 5 attempts is a good default). A check that the planner is not repeating the same action on identical state. A partial-answer fallback that runs the generate node with whatever evidence is available when the ceiling fires. Without these, a mis-routed planner will burn tokens on questions the corpus cannot answer.

Key takeaways

  1. Linear RAG fails on comparison, multi-hop, and vague questions because a function cannot make routing decisions between retrieval steps.
  2. Agentic RAG models the pipeline as a state graph with a planner node that routes between retrieve, rewrite, tool call, and generate.
  3. LangGraph writes the loop for you. Define state, write node functions, add conditional edges, compile. The pattern fits in under 80 lines.
  4. Planning and rewriting are different jobs. Planning decides the next step; rewriting transforms the query. You need both.
  5. Add tool nodes when the question needs live data or computed values. The corpus cannot answer everything.
  6. To see this pattern wired into a full production agentic RAG stack with grading, reranking, and self-correction, walk through the Agentic RAG Masterclass, or start with the RAG Fundamentals primer.

For the official LangGraph concepts and tutorials, see the LangGraph documentation. The patterns in this post map directly onto the StateGraph API documented there.

Share this post

Continue Reading

Weekly Bytes of AI

Technical deep-dives for engineers building production AI systems.

Architecture patterns, system design, cost optimization, and real-world case studies. No fluff, just engineering insights.

Unsubscribe anytime. We respect your inbox.

Ready to go deeper?

Go beyond articles. Build production AI systems with hands-on workshops and our intensive AI Bootcamp.