Sub-graphs in LangGraph for complex RAG queries

Your RAG graph has 18 nodes and you cannot find anything

You started with a clean 6-node RAG graph. Then you added a reranker (+1 node). Then grounding (+2 nodes). Then fallback paths (+3). Then query decomposition (+2). Then self-correction (+4). You are now at 18 nodes in one file. Opening the diagram takes 4 seconds to render. Finding the node you need to edit takes 30. Every new feature adds friction.

The fix is the same pattern software engineers use when a function gets too big: split it. LangGraph has first-class support for sub-graphs. A sub-graph is a small, focused graph that compiles into a single node of a larger graph. The outer graph sees one node; the sub-graph encapsulates its own nodes, edges, and state transitions.

This post is the sub-graph composition pattern, the state isolation rule that makes sub-graphs safe, and the 3 signals that tell you it is time to split.

Why does a single monolithic graph break at 15 nodes?

Because you lose the "single screen" property. A graph that fits on one screen is readable. A graph that scrolls is not. Beyond roughly 15 nodes, the cognitive load of tracking conditional edges across pages eats into whatever productivity the graph abstraction was supposed to give you.

3 concrete failure modes:

Routing logic becomes opaque. When an edge routes based on 4 state fields through a function deep in the file, you lose the ability to read the graph at a glance.
State fields multiply. Every new node adds 1 or 2 state fields. By node 15, your state TypedDict is a 20-field bag and no node uses more than 4 of them.
Testing becomes painful. You want to test the reranker node in isolation, but the state it reads from is built up by 5 earlier nodes. Stubbing all 5 to test 1 is high friction.

Sub-graphs fix all 3 by encapsulation. A sub-graph has its own small state, its own nodes, its own edges. The outer graph passes data in and receives data out. The internals are hidden unless you open the sub-graph.

graph TD
    Main[Main graph] --> Node1[rewrite]
    Node1 --> Node2[route]
    Node2 -->|simple| Sub1[retrieve_sub]
    Node2 -->|complex| Sub2[decompose_sub]
    Sub1 --> Node3[generate]
    Sub2 --> Node3
    Node3 --> End([answer])

    Sub1 -. encapsulates .-> Inner1[small graph<br/>3 nodes]
    Sub2 -. encapsulates .-> Inner2[small graph<br/>5 nodes]

    style Sub1 fill:#dbeafe,stroke:#1e40af
    style Sub2 fill:#dbeafe,stroke:#1e40af

The main graph has 5 visible nodes. The 2 sub-graphs hide 8 more behind them. Reading the main graph is fast, and when you need to understand the retrieval details, you open the sub-graph alone.

What is a sub-graph in LangGraph?

A sub-graph is a compiled StateGraph that you add as a single node in a larger StateGraph. From the outside, it looks like a regular node: input state goes in, output state comes out. From the inside, it is its own graph with its own nodes and edges.

LangGraph handles the state mapping at the boundary. The sub-graph can have a different state schema from the outer graph, and LangGraph bridges between them using an adapter you provide.

# filename: subgraph_basics.py
# description: A sub-graph is a compiled StateGraph used as a node.
# The inner state can differ from the outer state.
from typing import TypedDict
from langgraph.graph import StateGraph, END


class OuterState(TypedDict):
    question: str
    retrieved: list[str]
    answer: str


class InnerRetrievalState(TypedDict):
    query: str
    raw_chunks: list[str]
    reranked: list[str]


def inner_retrieve(state: InnerRetrievalState) -> dict:
    return {'raw_chunks': search_vector(state['query'])}


def inner_rerank(state: InnerRetrievalState) -> dict:
    return {'reranked': rerank_by_crossencoder(state['query'], state['raw_chunks'])}


def build_retrieval_subgraph():
    inner = StateGraph(InnerRetrievalState)
    inner.add_node('retrieve', inner_retrieve)
    inner.add_node('rerank', inner_rerank)
    inner.set_entry_point('retrieve')
    inner.add_edge('retrieve', 'rerank')
    inner.add_edge('rerank', END)
    return inner.compile()

build_retrieval_subgraph() returns a compiled graph. From the outer graph's perspective, this is just a node like any other. From the inside, it has its own 2-node retrieve-and-rerank flow.

How do you add a sub-graph to a main graph?

Use add_node with the compiled sub-graph, and write a small adapter function to map between the outer and inner state schemas.

# filename: main_graph.py
# description: Main RAG graph uses the retrieval sub-graph as one node.
# The adapter maps between outer and inner state.
from langgraph.graph import StateGraph, END
from subgraph_basics import OuterState, build_retrieval_subgraph

retrieval_subgraph = build_retrieval_subgraph()


def retrieve_node(state: OuterState) -> dict:
    inner_result = retrieval_subgraph.invoke({
        'query': state['question'],
        'raw_chunks': [],
        'reranked': [],
    })
    return {'retrieved': inner_result['reranked']}


def generate_node(state: OuterState) -> dict:
    context = '\n'.join(state['retrieved'])
    return {'answer': call_llm(state['question'], context)}


main = StateGraph(OuterState)
main.add_node('retrieve', retrieve_node)
main.add_node('generate', generate_node)
main.set_entry_point('retrieve')
main.add_edge('retrieve', 'generate')
main.add_edge('generate', END)

graph = main.compile()

Read the adapter pattern in retrieve_node. It takes the outer question, builds an inner state, calls the sub-graph, and maps the inner result back to the outer state. This is the isolation boundary: the outer graph never knows the sub-graph has a raw_chunks field.

What is the state isolation rule?

Each sub-graph owns its own state schema. The outer state should only contain fields that are relevant to the outer flow. Fields that exist only to pass data between inner nodes live in the inner state and never leak out.

3 rules that make isolation work:

The outer state is for data that survives across sub-graphs. Question, retrieved chunks, final answer.
The inner state is for data that lives inside one sub-graph. Raw chunks before reranking, intermediate scores, working buffers.
The adapter function is the only place that knows both schemas. Everything else on either side is blind to the other.

Applying this rule keeps the outer state small. A main graph with 5 nodes and a 4-field state is readable. A main graph with 5 nodes and a 30-field state is not, even though it has the same node count.

When should you split a graph into sub-graphs?

3 signals that it is time to split:

The graph has more than 10 to 15 nodes in a single file. Readability drops off a cliff above this count.
A subset of nodes always runs in sequence and shares state fields that no other nodes touch. That subset is a natural sub-graph boundary.
You want to reuse a piece of the graph in more than one place. A reranker sub-graph can be called from both the main path and a retry path without duplication.

2 signals that you should NOT split:

You are below 10 nodes. Sub-graphs add ceremony. Stay monolithic until the node count justifies the split.
The nodes share state broadly. If every node reads and writes to the same 5 fields, the boundaries are not clean and splitting will require exposing internal state.

For the full picture of how LangGraph graphs work, see the Visualizing RAG Pipelines with LangGraph StateGraph post. For an agentic RAG example that uses sub-graphs for grading and self-correction, the Agentic RAG Masterclass covers the pattern in depth. The free RAG Fundamentals primer is the right starting point if you are still building your first pipeline.

What does a realistic sub-graph decomposition look like?

A production RAG pipeline with 18 nodes typically splits into 3 sub-graphs plus a main graph:

Query preparation sub-graph: rewrite, classify, decompose. Takes a raw question, returns a list of structured queries. 4 to 5 nodes.
Retrieval sub-graph: vector search, reranker, quote extraction, grounding. Takes a query, returns a clean context. 4 to 6 nodes.
Generation sub-graph: draft, grade, self-correct, finalize. Takes a context and a question, returns an answer. 3 to 5 nodes.

The main graph becomes 4 nodes: query prep, retrieval, generation, fallback. Each of those is a sub-graph call. The whole 18-node pipeline is now a 4-node readable main graph plus 3 small focused sub-graphs.

Compared to the monolith: same total node count, much better organization, much easier to change any single sub-graph without touching the others.

How do you debug across sub-graph boundaries?

Stream state at both levels. The outer graph's stream shows you which main-level node is running. The inner graph's stream (available from the invoke call inside the adapter) shows you the inner node transitions. Log both when debugging.

# filename: debug_subgraph.py
# description: Stream both the outer graph and the inner graph for
# full visibility when debugging a bad answer.
def debug_invoke(question: str):
    initial = {'question': question, 'retrieved': [], 'answer': ''}
    for step in graph.stream(initial):
        for node_name, update in step.items():
            print(f'[outer] {node_name}: {list(update.keys())}')
            if node_name == 'retrieve':
                # inner graph was called inside retrieve_node
                pass

For production systems, wire both streams into your observability layer. Langfuse, Arize, and similar tools accept nested spans that let you see the inner graph execution inside the parent trace. For the Langfuse side specifically, see the Langfuse Integration for Agentic AI Tracing post.

What to do Monday morning

Count the nodes in your biggest RAG graph. If it is above 12, plan the split. If it is below 10, stay monolithic.
Identify the 2 or 3 natural groupings of nodes that always run together and share state. These are your sub-graph boundaries.
Extract each grouping into its own build_X_subgraph() function that returns a compiled StateGraph. Keep the inner state schema small and focused.
Write adapter functions that map between outer and inner state. Each adapter should be under 10 lines and live right next to the node that calls it.
Stream both the outer and inner graphs during testing. Confirm the debugging experience is better, not worse, than the monolithic version. If worse, you split at the wrong boundary.

The headline: sub-graphs let you scale a LangGraph pipeline past 15 nodes without losing readability. The state isolation rule keeps the boundaries clean. Split when you hit the node count limit, not before.

Frequently asked questions

What is a sub-graph in LangGraph?

A sub-graph is a compiled StateGraph that acts as a single node inside a larger StateGraph. It has its own state schema, nodes, and edges, but from the outside it looks like a regular node: state goes in, state comes out. Sub-graphs let you encapsulate related steps and keep the main graph small as the total pipeline grows.

Why should I split a RAG pipeline into sub-graphs?

Because monolithic graphs become unreadable past 10 to 15 nodes. Sub-graphs let you keep the main graph small (under 10 nodes) while still supporting pipelines with 20+ total nodes. The encapsulation also improves testability, lets you reuse pieces across flows, and shrinks the outer state by hiding intermediate fields inside sub-graphs.

How does state isolation work between a main graph and a sub-graph?

Each graph has its own state schema. The adapter function (the node that calls the sub-graph) is the only place that knows both schemas. It takes outer state, builds inner state, invokes the sub-graph, and maps the result back to outer state. This keeps the outer state focused on what survives across sub-graphs and the inner state focused on intermediate data.

When should I not split a graph into sub-graphs?

When you have fewer than 10 nodes (monolithic is simpler), or when the nodes share state broadly and have no clean boundary to split on. Splitting a graph whose nodes all share the same 5 state fields will require leaking those fields into the sub-graph interface, which eliminates the isolation benefit.

How do I debug across sub-graph boundaries?

Stream state at both the outer and inner levels. Log the outer transitions to see which main-level node is running. Inside the adapter, optionally enable inner streaming to see sub-graph transitions. For production, use an observability tool that supports nested spans so the inner graph execution shows up as children of the parent node in your traces.

Key takeaways

Monolithic LangGraph pipelines become unreadable above 10 to 15 nodes. Sub-graphs are the natural way to scale past that limit.
A sub-graph is a compiled StateGraph used as a single node in a larger graph. It has its own state, nodes, and edges.
State isolation is the key property. Each sub-graph owns its schema; the adapter function is the only place that knows both sides.
Split when you have a natural grouping of nodes that always runs together and shares state no other nodes touch. Do not split pre-emptively.
A typical production RAG pipeline decomposes into 3 sub-graphs (prep, retrieval, generation) plus a 4-node main graph. Same total nodes, much better organization.
To see sub-graphs wired into a full production agentic RAG stack with planning, grading, and self-correction, walk through the Agentic RAG Masterclass, or start with the RAG Fundamentals primer.

For the full LangGraph documentation on sub-graph composition and the state-mapping API, see the LangGraph sub-graphs guide. The examples there cover more advanced patterns like dynamic sub-graph dispatch.

Your RAG graph has 18 nodes and you cannot find anything

This post is the sub-graph composition pattern, the state isolation rule that makes sub-graphs safe, and the 3 signals that tell you it is time to split.

Why does a single monolithic graph break at 15 nodes?

3 concrete failure modes:

Routing logic becomes opaque. When an edge routes based on 4 state fields through a function deep in the file, you lose the ability to read the graph at a glance.
State fields multiply. Every new node adds 1 or 2 state fields. By node 15, your state TypedDict is a 20-field bag and no node uses more than 4 of them.
Testing becomes painful. You want to test the reranker node in isolation, but the state it reads from is built up by 5 earlier nodes. Stubbing all 5 to test 1 is high friction.

graph TD
    Main[Main graph] --> Node1[rewrite]
    Node1 --> Node2[route]
    Node2 -->|simple| Sub1[retrieve_sub]
    Node2 -->|complex| Sub2[decompose_sub]
    Sub1 --> Node3[generate]
    Sub2 --> Node3
    Node3 --> End([answer])

    Sub1 -. encapsulates .-> Inner1[small graph<br/>3 nodes]
    Sub2 -. encapsulates .-> Inner2[small graph<br/>5 nodes]

    style Sub1 fill:#dbeafe,stroke:#1e40af
    style Sub2 fill:#dbeafe,stroke:#1e40af

The main graph has 5 visible nodes. The 2 sub-graphs hide 8 more behind them. Reading the main graph is fast, and when you need to understand the retrieval details, you open the sub-graph alone.

What is a sub-graph in LangGraph?

LangGraph handles the state mapping at the boundary. The sub-graph can have a different state schema from the outer graph, and LangGraph bridges between them using an adapter you provide.

# filename: subgraph_basics.py
# description: A sub-graph is a compiled StateGraph used as a node.
# The inner state can differ from the outer state.
from typing import TypedDict
from langgraph.graph import StateGraph, END


class OuterState(TypedDict):
    question: str
    retrieved: list[str]
    answer: str


class InnerRetrievalState(TypedDict):
    query: str
    raw_chunks: list[str]
    reranked: list[str]


def inner_retrieve(state: InnerRetrievalState) -> dict:
    return {'raw_chunks': search_vector(state['query'])}


def inner_rerank(state: InnerRetrievalState) -> dict:
    return {'reranked': rerank_by_crossencoder(state['query'], state['raw_chunks'])}


def build_retrieval_subgraph():
    inner = StateGraph(InnerRetrievalState)
    inner.add_node('retrieve', inner_retrieve)
    inner.add_node('rerank', inner_rerank)
    inner.set_entry_point('retrieve')
    inner.add_edge('retrieve', 'rerank')
    inner.add_edge('rerank', END)
    return inner.compile()

build_retrieval_subgraph() returns a compiled graph. From the outer graph's perspective, this is just a node like any other. From the inside, it has its own 2-node retrieve-and-rerank flow.

How do you add a sub-graph to a main graph?

Use add_node with the compiled sub-graph, and write a small adapter function to map between the outer and inner state schemas.

# filename: main_graph.py
# description: Main RAG graph uses the retrieval sub-graph as one node.
# The adapter maps between outer and inner state.
from langgraph.graph import StateGraph, END
from subgraph_basics import OuterState, build_retrieval_subgraph

retrieval_subgraph = build_retrieval_subgraph()


def retrieve_node(state: OuterState) -> dict:
    inner_result = retrieval_subgraph.invoke({
        'query': state['question'],
        'raw_chunks': [],
        'reranked': [],
    })
    return {'retrieved': inner_result['reranked']}


def generate_node(state: OuterState) -> dict:
    context = '\n'.join(state['retrieved'])
    return {'answer': call_llm(state['question'], context)}


main = StateGraph(OuterState)
main.add_node('retrieve', retrieve_node)
main.add_node('generate', generate_node)
main.set_entry_point('retrieve')
main.add_edge('retrieve', 'generate')
main.add_edge('generate', END)

graph = main.compile()

What is the state isolation rule?

3 rules that make isolation work:

The outer state is for data that survives across sub-graphs. Question, retrieved chunks, final answer.
The inner state is for data that lives inside one sub-graph. Raw chunks before reranking, intermediate scores, working buffers.
The adapter function is the only place that knows both schemas. Everything else on either side is blind to the other.

When should you split a graph into sub-graphs?

3 signals that it is time to split:

The graph has more than 10 to 15 nodes in a single file. Readability drops off a cliff above this count.
A subset of nodes always runs in sequence and shares state fields that no other nodes touch. That subset is a natural sub-graph boundary.
You want to reuse a piece of the graph in more than one place. A reranker sub-graph can be called from both the main path and a retry path without duplication.

2 signals that you should NOT split:

You are below 10 nodes. Sub-graphs add ceremony. Stay monolithic until the node count justifies the split.
The nodes share state broadly. If every node reads and writes to the same 5 fields, the boundaries are not clean and splitting will require exposing internal state.

What does a realistic sub-graph decomposition look like?

A production RAG pipeline with 18 nodes typically splits into 3 sub-graphs plus a main graph:

Query preparation sub-graph: rewrite, classify, decompose. Takes a raw question, returns a list of structured queries. 4 to 5 nodes.
Retrieval sub-graph: vector search, reranker, quote extraction, grounding. Takes a query, returns a clean context. 4 to 6 nodes.
Generation sub-graph: draft, grade, self-correct, finalize. Takes a context and a question, returns an answer. 3 to 5 nodes.

Compared to the monolith: same total node count, much better organization, much easier to change any single sub-graph without touching the others.

How do you debug across sub-graph boundaries?

# filename: debug_subgraph.py
# description: Stream both the outer graph and the inner graph for
# full visibility when debugging a bad answer.
def debug_invoke(question: str):
    initial = {'question': question, 'retrieved': [], 'answer': ''}
    for step in graph.stream(initial):
        for node_name, update in step.items():
            print(f'[outer] {node_name}: {list(update.keys())}')
            if node_name == 'retrieve':
                # inner graph was called inside retrieve_node
                pass

What to do Monday morning

Count the nodes in your biggest RAG graph. If it is above 12, plan the split. If it is below 10, stay monolithic.
Identify the 2 or 3 natural groupings of nodes that always run together and share state. These are your sub-graph boundaries.
Extract each grouping into its own build_X_subgraph() function that returns a compiled StateGraph. Keep the inner state schema small and focused.
Write adapter functions that map between outer and inner state. Each adapter should be under 10 lines and live right next to the node that calls it.
Stream both the outer and inner graphs during testing. Confirm the debugging experience is better, not worse, than the monolithic version. If worse, you split at the wrong boundary.

Monolithic LangGraph pipelines become unreadable above 10 to 15 nodes. Sub-graphs are the natural way to scale past that limit.
A sub-graph is a compiled StateGraph used as a single node in a larger graph. It has its own state, nodes, and edges.
State isolation is the key property. Each sub-graph owns its schema; the adapter function is the only place that knows both sides.
Split when you have a natural grouping of nodes that always runs together and shares state no other nodes touch. Do not split pre-emptively.
A typical production RAG pipeline decomposes into 3 sub-graphs (prep, retrieval, generation) plus a 4-node main graph. Same total nodes, much better organization.
To see sub-graphs wired into a full production agentic RAG stack with planning, grading, and self-correction, walk through the Agentic RAG Masterclass, or start with the RAG Fundamentals primer.

Sub-graphs in LangGraph for complex RAG queries

Share this post

Share this post

Continue Reading

Which language should you build Redis in? Lessons from rebuilding it 6 times

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Weekly Bytes of AI

Ready to go deeper?

Sub-graphs in LangGraph for complex RAG queries

Share this post

Share this post

Continue Reading

Which language should you build Redis in? Lessons from rebuilding it 6 times

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Weekly Bytes of AI

Ready to go deeper?