Stateful agents with LangGraph: beyond linear chains

Your LangChain chain was fine until you needed a loop

You built your first agent with a LangChain chain. Prompt template, LLM, output parser, done. It worked for one-shot queries. Then you wanted the agent to retry with different retrieval if the first attempt was weak. Then you wanted branching: call a tool sometimes, answer directly other times. Then you wanted to pause for human approval. Every one of those features required monkey-patching the chain, wrapping it in outer loops, or giving up and writing a plain while loop that barely resembles LangChain anymore.

This is the wall LangChain chains hit on real agents. Chains are linear function compositions. Agents are state machines with loops, branches, and pause-resume semantics. You cannot model a state machine as a function composition without bending it into an unrecognizable shape.

LangGraph is the library built to replace chains for agent workloads. State graphs instead of function chains. Loops and conditionals as first-class primitives. Persistence via checkpointers. This post is why LangGraph is the right abstraction, what a stateful agent looks like in practice, and the migration path from a linear chain to a stateful graph.

Why do linear chains fail on real agent workloads?

Because chains model pipelines, not state machines. A chain is a → b → c → d. An agent is "do a. if a says retry, go back to a with new input. if a says tool, run the tool and loop. if a says done, return." The control flow has cycles, conditionals, and re-entry. Chains have none of those.

3 things chains cannot express cleanly:

Loops. A planner that refines its own plan needs to loop back to itself. A chain runs forward once and stops.
Conditional branching. Different LLM outputs should lead to different next steps. A chain picks one path at construction time.
Pause and resume. Human-in-the-loop approvals need the pipeline to stop, wait, and continue later with injected state. Chains do not have pause points.

You can work around all 3 by wrapping the chain in imperative Python. At that point you have written the loop yourself and LangChain is only doing the LLM call. That is fine as a starting point but it scales poorly as the agent grows.

graph LR
    Chain[Linear chain] -->|a| B[b]
    B -->|forward| C[c]
    C --> D[d]

    Graph[State graph] --> SA[node a]
    SA -->|case X| SB[node b]
    SA -->|case Y| SC[node c]
    SB --> SA
    SC --> End([end])

    style Chain fill:#fee2e2,stroke:#b91c1c
    style Graph fill:#dcfce7,stroke:#15803d

The chain is a straight line. The graph has loops and branches, which are exactly what real agent logic needs.

What is a stateful agent in LangGraph?

An agent expressed as a graph where nodes are functions, edges are transitions, and state is a shared dict that flows through the graph. The state carries information between nodes. Conditional edges route based on state. Loops are just edges that point backward.

LangGraph is the library that provides the graph runtime. You define state, nodes, and edges. LangGraph handles execution, conditional routing, checkpointing, and observability.

# filename: stateful_agent.py
# description: A stateful agent as a LangGraph StateGraph. Plan, act,
# observe, decide, loop or finish.
from typing import TypedDict, Literal
from langgraph.graph import StateGraph, END


class AgentState(TypedDict):
    task: str
    plan: list[str]
    observations: list[str]
    step: int
    done: bool
    answer: str


def plan_node(state: AgentState) -> dict:
    return {'plan': make_plan(state['task'])}


def act_node(state: AgentState) -> dict:
    next_step = state['plan'][state['step']]
    obs = execute(next_step)
    return {
        'observations': state['observations'] + [obs],
        'step': state['step'] + 1,
    }


def decide_node(state: AgentState) -> dict:
    if state['step'] >= len(state['plan']):
        return {'done': True, 'answer': summarize(state['observations'])}
    return {'done': False}


def route(state: AgentState) -> Literal['act', 'end']:
    return 'end' if state['done'] else 'act'


builder = StateGraph(AgentState)
builder.add_node('plan', plan_node)
builder.add_node('act', act_node)
builder.add_node('decide', decide_node)
builder.set_entry_point('plan')
builder.add_edge('plan', 'act')
builder.add_edge('act', 'decide')
builder.add_conditional_edges('decide', route, {'act': 'act', 'end': END})

graph = builder.compile()

Read the add_conditional_edges call. That is the loop. decide routes to either act (loop back and execute the next plan step) or END (finish the agent). No imperative while loop. No monkey-patching. The loop is part of the graph.

For the deeper walkthrough of graph construction and visualization, see the Visualizing RAG Pipelines with LangGraph StateGraph post. For the persistence layer that makes pause-resume possible, see LangGraph Persistence: Why Production Agents Need Thread Models.

What can you do with a stateful graph that you can't do with a chain?

4 capabilities that chains cannot match:

Adaptive loops. The agent can revise its own plan mid-execution based on what it observed. Every iteration gets access to the full state, not just the previous output.
Multi-path branching. Different LLM decisions lead to different downstream nodes. A chain would have to construct one path; the graph routes dynamically.
Checkpointing and resume. LangGraph persists state at every node transition. You can pause an agent, change its state, and resume it from the exact point it stopped. This is how human-in-the-loop approvals work.
Parallel execution. Multiple nodes can run concurrently and merge their results. A chain is strictly sequential.

Each capability solves a real production need. Adaptive loops are how agentic RAG re-plans. Multi-path branching is how a router dispatches to different tools. Checkpointing is how approval workflows pause. Parallel execution is how hybrid retrievers run vector and graph searches at the same time.

How does the state model differ from a chain's input/output?

A chain passes a single value (or a small dict) from step to step. A graph passes a typed state dict that every node can read from and write to independently. Nodes only update the fields they care about; LangGraph merges updates into the next state automatically.

The state model is where the real abstraction lives. 3 properties:

Typed. Use TypedDict or a Pydantic model. The type checker catches bad accesses.
Flat. Nested state is hard to inspect during debugging. Put each concept on its own top-level key.
Mergeable. LangGraph merges partial updates automatically. A node that returns {'answer': '...'} does not have to re-specify every other field.

# filename: state_design.py
# description: A flat typed state with one field per concept.
# Nodes return partial dicts; LangGraph merges them.
from typing import TypedDict

class GoodState(TypedDict):
    question: str
    retrieved: list[str]
    draft: str
    grade: str
    answer: str

# Bad: nested and hard to debug
class BadState(TypedDict):
    data: dict  # opaque blob


def ok_node(state: GoodState) -> dict:
    return {'draft': 'a new draft'}  # only updates one field

A well-designed state makes debugging easy because you can read the "current state of the pipeline" at any node transition and understand everything.

How do you add persistence with checkpointers?

One line at compile time. LangGraph provides a SqliteSaver or PostgresSaver that automatically persists state at every node transition. You configure it once and every invocation is checkpointable.

# filename: checkpointed.py
# description: Compile the graph with a persistence layer so state
# can be resumed from any transition.
from langgraph.checkpoint.sqlite import SqliteSaver
from stateful_agent import builder

checkpointer = SqliteSaver.from_conn_string('./agent.db')
graph = builder.compile(checkpointer=checkpointer)

config = {'configurable': {'thread_id': 'user-123'}}
result = graph.invoke(initial_state, config=config)

# Later, resume from the same thread
later = graph.invoke(None, config=config)

The thread_id identifies a specific agent run. Multiple threads can run in parallel in the same process (or across processes), each with their own state. Pausing is as simple as returning from a node; the checkpointer saves state automatically. Resuming means calling invoke again with the same thread ID.

For the deeper dive on the persistence layer and thread models, see the LangGraph Persistence: Why Production Agents Need Thread Models post. For the full agentic RAG example that uses stateful graphs, see Agentic RAG with LangGraph: Planning, Rewriting, and Tool Use.

When should you migrate from LangChain chains to LangGraph?

3 signals that it is time:

Your chain has an outer while loop you wrote manually. That loop belongs inside the graph as a conditional edge.
You have if branches in a chain that pick different sub-chains based on model output. Those belong as conditional edges.
You need pause-resume or human-in-the-loop. Chains do not support this; graphs do natively via checkpointers.

One signal that it is NOT time: your agent is a one-shot classify-and-respond pipeline with no retries, no branches, and no loops. Chains are simpler for truly linear flows and you should not migrate for the sake of migrating.

For the full production agent stack with graphs, persistence, tools, and observability, the Build Your Own Coding Agent course walks through it module by module. The free AI Agents Fundamentals primer is the right starting point if the agent loop concept is still new.

What to do Monday morning

Audit your current agent code. If you have a while loop wrapping a LangChain chain, or if branches picking different chains, that is a graph waiting to happen.
Sketch the graph on paper: nodes are the steps, edges are the transitions, state is the dict that flows through. This sketch is the spec for the migration.
Convert the sketch into a StateGraph with add_node and add_edge calls. Keep the state schema flat and typed.
Add a checkpointer if you need pause-resume, even just for debugging. Resuming a graph from mid-execution is the best debugging experience in agent development.
Run the graph against your existing eval set. Expect parity on simple cases and a meaningful lift on cases that needed loops or branches.

The headline: LangChain chains are linear function compositions. Real agents are state machines. LangGraph is the library that makes state machines cheap to build, debug, and ship. Migrate when you outgrow the chain abstraction, not before.

Frequently asked questions

What is the difference between LangChain chains and LangGraph graphs?

Chains are linear function compositions that run forward once. Graphs are state machines with nodes, conditional edges, and persistent state. Chains work for one-shot pipelines. Graphs handle loops, branching, and pause-resume, which are exactly what real agent workloads need. LangGraph is the purpose-built library for graphs; it supersedes LangChain's Agent abstractions for production.

Why can't LangChain chains express agent loops cleanly?

Because chains are a → b → c pipelines that run forward and return. They cannot loop back to themselves, branch on output, or pause for human input. You can work around this with imperative Python wrappers, but at that point you have written the loop yourself and LangChain is only doing the LLM call. A state graph models the same logic natively.

What is a stateful agent in LangGraph?

A stateful agent is a graph where nodes are functions that read and update a shared state dict. The state carries information between nodes, and conditional edges route execution based on state. This is the natural shape for agents that plan, act, observe, and decide what to do next based on what they saw. LangGraph provides the runtime.

How does LangGraph handle persistence for long-running agents?

Through checkpointers. A checkpointer (SQLite, Postgres, or custom) saves state at every node transition. Each agent run has a thread ID that identifies it, and multiple threads can run in parallel. You can pause an agent by returning from a node and resume it later by calling invoke again with the same thread ID. This is how human-in-the-loop approvals work.

When should I migrate from LangChain chains to LangGraph?

When you have outer while loops, conditional branches, or pause-resume requirements. Those are all signs you outgrew the chain abstraction. If your agent is a one-shot classify-and-respond pipeline with no retries or branches, a chain is simpler and you should not migrate. Migrate when the chain stops fitting the shape of your agent logic.

Key takeaways

Chains are linear function compositions. Agents are state machines. You cannot model a state machine as a chain without bending it into an unrecognizable shape.
LangGraph replaces chains for agent workloads. Nodes, conditional edges, and shared state are first-class. Loops, branches, and pause-resume work natively.
Keep the state flat and typed. Nested state hides information during debugging. One concept per top-level key is the right design.
Add a checkpointer to enable pause-resume and mid-execution debugging. Resuming a graph from any point is the best debugging experience in agent development.
Migrate when you hit loops, branches, or pause-resume needs. Do not migrate a truly linear one-shot pipeline.
To see stateful agents wired into a full production stack with persistence, tools, and observability, walk through the Build Your Own Coding Agent course, or start with the AI Agents Fundamentals primer.

For the full LangGraph documentation with state design patterns, checkpointers, and multi-agent examples, see the LangGraph docs. The "Why LangGraph" page there covers the same argument in more depth.

Your LangChain chain was fine until you needed a loop

Why do linear chains fail on real agent workloads?

3 things chains cannot express cleanly:

Loops. A planner that refines its own plan needs to loop back to itself. A chain runs forward once and stops.
Conditional branching. Different LLM outputs should lead to different next steps. A chain picks one path at construction time.
Pause and resume. Human-in-the-loop approvals need the pipeline to stop, wait, and continue later with injected state. Chains do not have pause points.

graph LR
    Chain[Linear chain] -->|a| B[b]
    B -->|forward| C[c]
    C --> D[d]

    Graph[State graph] --> SA[node a]
    SA -->|case X| SB[node b]
    SA -->|case Y| SC[node c]
    SB --> SA
    SC --> End([end])

    style Chain fill:#fee2e2,stroke:#b91c1c
    style Graph fill:#dcfce7,stroke:#15803d

The chain is a straight line. The graph has loops and branches, which are exactly what real agent logic needs.

What is a stateful agent in LangGraph?

LangGraph is the library that provides the graph runtime. You define state, nodes, and edges. LangGraph handles execution, conditional routing, checkpointing, and observability.

# filename: stateful_agent.py
# description: A stateful agent as a LangGraph StateGraph. Plan, act,
# observe, decide, loop or finish.
from typing import TypedDict, Literal
from langgraph.graph import StateGraph, END


class AgentState(TypedDict):
    task: str
    plan: list[str]
    observations: list[str]
    step: int
    done: bool
    answer: str


def plan_node(state: AgentState) -> dict:
    return {'plan': make_plan(state['task'])}


def act_node(state: AgentState) -> dict:
    next_step = state['plan'][state['step']]
    obs = execute(next_step)
    return {
        'observations': state['observations'] + [obs],
        'step': state['step'] + 1,
    }


def decide_node(state: AgentState) -> dict:
    if state['step'] >= len(state['plan']):
        return {'done': True, 'answer': summarize(state['observations'])}
    return {'done': False}


def route(state: AgentState) -> Literal['act', 'end']:
    return 'end' if state['done'] else 'act'


builder = StateGraph(AgentState)
builder.add_node('plan', plan_node)
builder.add_node('act', act_node)
builder.add_node('decide', decide_node)
builder.set_entry_point('plan')
builder.add_edge('plan', 'act')
builder.add_edge('act', 'decide')
builder.add_conditional_edges('decide', route, {'act': 'act', 'end': END})

graph = builder.compile()

What can you do with a stateful graph that you can't do with a chain?

4 capabilities that chains cannot match:

Adaptive loops. The agent can revise its own plan mid-execution based on what it observed. Every iteration gets access to the full state, not just the previous output.
Multi-path branching. Different LLM decisions lead to different downstream nodes. A chain would have to construct one path; the graph routes dynamically.
Checkpointing and resume. LangGraph persists state at every node transition. You can pause an agent, change its state, and resume it from the exact point it stopped. This is how human-in-the-loop approvals work.
Parallel execution. Multiple nodes can run concurrently and merge their results. A chain is strictly sequential.

How does the state model differ from a chain's input/output?

The state model is where the real abstraction lives. 3 properties:

Typed. Use TypedDict or a Pydantic model. The type checker catches bad accesses.
Flat. Nested state is hard to inspect during debugging. Put each concept on its own top-level key.
Mergeable. LangGraph merges partial updates automatically. A node that returns {'answer': '...'} does not have to re-specify every other field.

# filename: state_design.py
# description: A flat typed state with one field per concept.
# Nodes return partial dicts; LangGraph merges them.
from typing import TypedDict

class GoodState(TypedDict):
    question: str
    retrieved: list[str]
    draft: str
    grade: str
    answer: str

# Bad: nested and hard to debug
class BadState(TypedDict):
    data: dict  # opaque blob


def ok_node(state: GoodState) -> dict:
    return {'draft': 'a new draft'}  # only updates one field

A well-designed state makes debugging easy because you can read the "current state of the pipeline" at any node transition and understand everything.

How do you add persistence with checkpointers?

# filename: checkpointed.py
# description: Compile the graph with a persistence layer so state
# can be resumed from any transition.
from langgraph.checkpoint.sqlite import SqliteSaver
from stateful_agent import builder

checkpointer = SqliteSaver.from_conn_string('./agent.db')
graph = builder.compile(checkpointer=checkpointer)

config = {'configurable': {'thread_id': 'user-123'}}
result = graph.invoke(initial_state, config=config)

# Later, resume from the same thread
later = graph.invoke(None, config=config)

When should you migrate from LangChain chains to LangGraph?

3 signals that it is time:

Your chain has an outer while loop you wrote manually. That loop belongs inside the graph as a conditional edge.
You have if branches in a chain that pick different sub-chains based on model output. Those belong as conditional edges.
You need pause-resume or human-in-the-loop. Chains do not support this; graphs do natively via checkpointers.

What to do Monday morning

Audit your current agent code. If you have a while loop wrapping a LangChain chain, or if branches picking different chains, that is a graph waiting to happen.
Sketch the graph on paper: nodes are the steps, edges are the transitions, state is the dict that flows through. This sketch is the spec for the migration.
Convert the sketch into a StateGraph with add_node and add_edge calls. Keep the state schema flat and typed.
Add a checkpointer if you need pause-resume, even just for debugging. Resuming a graph from mid-execution is the best debugging experience in agent development.
Run the graph against your existing eval set. Expect parity on simple cases and a meaningful lift on cases that needed loops or branches.

Chains are linear function compositions. Agents are state machines. You cannot model a state machine as a chain without bending it into an unrecognizable shape.
LangGraph replaces chains for agent workloads. Nodes, conditional edges, and shared state are first-class. Loops, branches, and pause-resume work natively.
Keep the state flat and typed. Nested state hides information during debugging. One concept per top-level key is the right design.
Add a checkpointer to enable pause-resume and mid-execution debugging. Resuming a graph from any point is the best debugging experience in agent development.
Migrate when you hit loops, branches, or pause-resume needs. Do not migrate a truly linear one-shot pipeline.
To see stateful agents wired into a full production stack with persistence, tools, and observability, walk through the Build Your Own Coding Agent course, or start with the AI Agents Fundamentals primer.

Stateful agents with LangGraph: beyond linear chains

Share this post

Share this post

Continue Reading

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Retry patterns for LLM API errors in production

Weekly Bytes of AI

Ready to go deeper?

Stateful agents with LangGraph: beyond linear chains

Share this post

Share this post

Continue Reading

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Retry patterns for LLM API errors in production

Weekly Bytes of AI

Ready to go deeper?