LangGraph persistence and thread models in production

Your LangGraph agent forgot everything when the server restarted

You built a multi-step agent in LangGraph. It plans, calls tools, refines, and replies. On your laptop it remembers context across messages because the in-memory checkpointer is doing its job. You ship it to production behind 2 replicas and the first deploy goes out fine. Then a user sends message 3 of a long debugging session, the request lands on the other replica, and the agent has no idea what they were talking about.

This is the persistence cliff. LangGraph makes the in-memory case so smooth that you forget there is a memory at all, and then production reminds you. The fix is not exotic, but the parts most tutorials skip (thread models, the difference between checkpointer and store, what actually survives a restart) are the parts that decide whether your agent is shippable.

This post is the production checklist for LangGraph state. We will look at why in-memory checkpointers cannot survive a real deployment, what a thread is, how the postgres checkpointer actually works, and the exact wiring you should ship on day one.

Why in-memory checkpointers stop working in production

The default MemorySaver checkpointer is a Python dict. Every conversation lives in your worker's RAM. That works perfectly until any of the following happens:

graph TD
    User[User sends message 3] --> LB[Load Balancer]
    LB -->|round robin| R1[Replica A - has thread state]
    LB -->|round robin| R2[Replica B - empty memory]

    R1 --> Good[Agent recalls context]
    R2 --> Bad[Agent: who are you?]

    Restart[Deploy or crash] --> R1
    Restart --> R2
    Restart -->|RAM gone| Wipe[All threads lost]

    style Bad fill:#fee2e2,stroke:#b91c1c
    style Wipe fill:#fee2e2,stroke:#b91c1c

3 failure modes, each one guaranteed once you have more than one process:

2 replicas behind a load balancer. Conversation state on one replica is invisible to the other. The user sees the agent forget them every other message.
A redeploy. Containers restart, RAM is wiped, every in-flight conversation evaporates.
A crash. Same as a redeploy except angrier and with no warning.

The fix is to push state out of the worker's memory and into a shared store that survives restarts and is visible to every replica. In LangGraph, that store is a checkpointer backed by a real database.

What is a thread in LangGraph?

A thread is a single conversation. It has an ID (thread_id), a sequence of state checkpoints (one per step the graph took), and a config dict that the runtime passes around to identify which conversation it is operating on. When you call graph.invoke(state, config={'configurable': {'thread_id': 'abc-123'}}), you are telling LangGraph "this state belongs to thread abc-123, save it under that key, load it under that key next time."

The thread is the unit of persistence. Everything LangGraph saves is scoped by thread. If you do not pass a thread_id, the runtime defaults to a fresh one for every call, which means no memory at all. Forgetting to thread state through the config is the most common reason people think LangGraph "lost" their agent's memory. It never had it.

A useful mental model: a thread is what your application calls a "chat session." It maps one-to-one with whatever your frontend would call a conversation. Most production apps generate a UUID per conversation, store it on the user's session, and pass it into LangGraph on every call.

How does LangGraph persistence actually work?

The checkpointer is a write-after-every-node abstraction. After each node in your graph runs, LangGraph serializes the current state, stamps it with a checkpoint ID, and writes the whole blob to the configured backend under the thread's key. On the next call with the same thread_id, it loads the most recent checkpoint and resumes from that state.

2 important properties fall out of this:

The agent can resume mid-graph after a crash. If your worker died between node 5 and node 6, the next request with the same thread_id will pick up from the checkpoint after node 5, not from scratch.
You can rewind. Because every step is checkpointed, you can list all checkpoints for a thread and re-invoke the graph from any of them. This is what LangGraph Studio uses for time travel debugging.

The checkpointer is not the same as the store. The store is for long-term, cross-thread memory (think: "the user's preferred name"). The checkpointer is for the working state of a single thread. Most production apps need both, but they are configured separately and serve different purposes.

How do you wire the Postgres checkpointer for production?

Use AsyncPostgresSaver from langgraph.checkpoint.postgres.aio. It is the async, production-grade backend. You initialize it inside your FastAPI lifespan so each worker has a clean connection pool, then pass it into your compiled graph.

# filename: app/graph.py
# description: A LangGraph agent with persistent state in postgres.
# Each request resumes from the latest checkpoint for its thread.
from contextlib import asynccontextmanager
from fastapi import FastAPI
from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
from langgraph.graph import StateGraph, END
from app.config import get_settings

settings = get_settings()


def build_graph(checkpointer):
    graph = StateGraph(dict)
    graph.add_node('plan', plan_node)
    graph.add_node('act', act_node)
    graph.set_entry_point('plan')
    graph.add_conditional_edges('plan', should_continue, {'act': 'act', 'end': END})
    graph.add_edge('act', 'plan')
    return graph.compile(checkpointer=checkpointer)


@asynccontextmanager
async def lifespan(app: FastAPI):
    async with AsyncPostgresSaver.from_conn_string(settings.database_url) as saver:
        await saver.setup()  # creates the checkpoint tables on first run
        app.state.agent = build_graph(saver)
        yield


app = FastAPI(lifespan=lifespan)

3 things in here matter and they are all the things tutorials get wrong. First, saver.setup() is the call that creates the checkpoint tables in postgres. Skip it on a fresh database and the first request crashes with a missing-table error. Run it idempotently on every startup (it is safe). Second, AsyncPostgresSaver.from_conn_string is a context manager. The async with is mandatory; the connection pool only exists inside the block. Third, the checkpointer lives on app.state, not as a module global. Each Uvicorn worker gets its own instance, which is exactly what you want.

For the broader pattern of where lifespan-managed clients fit in a production agent service, see the FastAPI and Uvicorn for Production Agentic AI Systems post. The same lifespan context that holds your LLM client should hold your checkpointer.

How do you use threads in a real request?

Pass a thread_id in the config every time you invoke the graph. The thread_id is whatever identifier your application uses for a conversation: a UUID stored on the user's session, a chat session row in your own database, anything stable.

# filename: app/routes/chat.py
# description: Each request loads the agent's prior state by thread_id
# and writes the new state back when the graph finishes.
from fastapi import APIRouter, Request
from uuid import uuid4

router = APIRouter()


@router.post('/chat')
async def chat(payload: dict, request: Request):
    agent = request.app.state.agent
    thread_id = payload.get('thread_id') or str(uuid4())

    config = {'configurable': {'thread_id': thread_id}}
    result = await agent.ainvoke(
        {'messages': [{'role': 'user', 'content': payload['message']}]},
        config=config,
    )

    return {'thread_id': thread_id, 'reply': result['messages'][-1].content}

The behavior to internalize: when the request handler returns, every state change made by every node in the graph is already in postgres. There is no separate "save" call. The checkpointer wrote it after each node ran. If the next request from this user lands on a different replica, that replica reads the same checkpoints from the same database and continues exactly where the first one left off.

If you want to see this same pattern wired into a more complete agent loop with streaming, planning, and tool calls, the Agentic RAG Masterclass walks through it end to end. The free AI Agents Fundamentals resource is a good starting point if you are still deciding whether your agent needs LangGraph at all.

Why do you need a thread model on top of the checkpointer?

The checkpointer stores state. It does not know that a thread has a name, when it was created, who owns it, or whether it should be visible in a "your conversations" list. For all of that, you need your own Thread row in your application database.

This is the part the LangGraph tutorials skip and the part every team eventually rebuilds. A thin Thread model with these fields covers most needs:

Field	Why it exists
`id` (UUID, PK)	Used as the LangGraph `thread_id`
`user_id` (FK)	So a user can list only their own threads
`title`	Generated from the first user message for display
`created_at`	Sort order in the UI; cleanup of stale threads
`last_message_at`	For "recently active" sorting
`archived` (bool)	Soft delete without losing checkpoints

The checkpoint blob in langgraph_checkpoints does not have any of those columns. It has thread_id, checkpoint, and a few internal fields. Trying to read user-facing metadata out of that table will lead you to write SQL you regret. Keep your thread metadata in your own table and let LangGraph own the checkpoint blob.

The contract between the 2: when a user creates a chat, you INSERT a row into threads, generate a UUID, and pass that UUID into LangGraph as thread_id. When they delete a chat, you DELETE from both threads and langgraph_checkpoints (filtered by thread_id). The 2 tables share a key but never share data.

What to do Monday morning

Audit every place in your code that calls graph.invoke or graph.ainvoke. If any of them omit config={'configurable': {'thread_id': ...}}, fix it. That is your "agent forgets everything" bug.
Replace MemorySaver with AsyncPostgresSaver. Move the initialization into your FastAPI lifespan. Run await saver.setup() once, then ship.
Create a threads table in your application database with the columns above. Generate the thread_id there, not in LangGraph. Pass it through.
Test the failure mode you actually fear: kill your container mid-conversation, bring it back up, send another message with the same thread_id, and confirm the agent picks up where it left off. If it does not, your checkpointer is misconfigured.
Add a cleanup job that deletes checkpoints older than your retention window. The checkpoint table grows fast for chatty agents and nobody cleans it up by default.

The end state is an agent that survives restarts, scales horizontally without a sticky-session hack, and exposes a clean Thread API to your frontend. The wiring takes an afternoon. The upside is that you stop having "the bot forgot me" Slack messages forever.

Frequently asked questions

What is a thread in LangGraph?

A thread is a single conversation, identified by a thread_id you pass in the config. LangGraph scopes every checkpoint by thread, so 2 threads stay completely isolated. Most production apps generate a UUID per conversation, store it in their own database, and pass it into the graph on every call. Forgetting to pass thread_id is the most common reason an agent appears to lose memory.

How does LangGraph persistence work?

LangGraph writes a checkpoint to the configured backend after every node in the graph runs. Each checkpoint is a serialized snapshot of the state at that point, scoped by thread_id. On the next call with the same thread, the runtime loads the latest checkpoint and resumes from there. This makes the agent crash-safe: if a worker dies mid-graph, the next request picks up from the last completed node.

What is the difference between memorysaver and asyncpostgressaver?

MemorySaver stores checkpoints in a Python dict inside the worker process. It is fast, requires no setup, and loses everything on restart or across replicas. AsyncPostgresSaver stores checkpoints in postgres, survives restarts, and is shared across every worker connected to the same database. Use MemorySaver for tests and notebooks. Use AsyncPostgresSaver for anything you ship.

Do I need a separate threads table if LangGraph already stores state?

Yes. The LangGraph checkpoint table only stores the serialized state blob, not user-facing metadata like title, owner, or archived status. Keep your own threads table for that information and use its primary key as the thread_id you pass into LangGraph. The 2 tables share a key but never share data, which keeps your application schema clean.

How do I prevent the LangGraph checkpoint table from growing forever?

Add a scheduled job that deletes checkpoints older than your retention window (30 to 90 days for most chat apps) or for archived threads. The checkpoint table grows by one row per node per message, so a chatty agent can accumulate millions of rows in weeks. LangGraph does not garbage-collect for you. Treat checkpoint cleanup the same way you treat log rotation.

Key takeaways

The default in-memory checkpointer cannot survive multi-replica deploys, restarts, or crashes. Move to AsyncPostgresSaver before your first production user.
A thread_id is the unit of persistence. Every graph.invoke call must pass one in the config or the agent has no memory.
The checkpointer stores blob state, not metadata. Keep your own threads table for title, owner, and archive status, and use its primary key as the LangGraph thread_id.
Initialize the checkpointer inside the FastAPI lifespan and call saver.setup() on startup. The async with block is mandatory.
Add a checkpoint cleanup job from day one. The table grows by one row per node per message and nothing prunes it for you.
To see persistence wired into a complete agent stack with retrieval and tool use, walk through the Agentic RAG Masterclass or start with the AI Agents Fundamentals primer.

The official LangGraph persistence concepts page is the source of truth for the checkpointer API and worth bookmarking. The recipes here are how I configure it for services that need to survive real traffic.

Your LangGraph agent forgot everything when the server restarted

Why in-memory checkpointers stop working in production

The default MemorySaver checkpointer is a Python dict. Every conversation lives in your worker's RAM. That works perfectly until any of the following happens:

graph TD
    User[User sends message 3] --> LB[Load Balancer]
    LB -->|round robin| R1[Replica A - has thread state]
    LB -->|round robin| R2[Replica B - empty memory]

    R1 --> Good[Agent recalls context]
    R2 --> Bad[Agent: who are you?]

    Restart[Deploy or crash] --> R1
    Restart --> R2
    Restart -->|RAM gone| Wipe[All threads lost]

    style Bad fill:#fee2e2,stroke:#b91c1c
    style Wipe fill:#fee2e2,stroke:#b91c1c

3 failure modes, each one guaranteed once you have more than one process:

2 replicas behind a load balancer. Conversation state on one replica is invisible to the other. The user sees the agent forget them every other message.
A redeploy. Containers restart, RAM is wiped, every in-flight conversation evaporates.
A crash. Same as a redeploy except angrier and with no warning.

What is a thread in LangGraph?

How does LangGraph persistence actually work?

2 important properties fall out of this:

The agent can resume mid-graph after a crash. If your worker died between node 5 and node 6, the next request with the same thread_id will pick up from the checkpoint after node 5, not from scratch.
You can rewind. Because every step is checkpointed, you can list all checkpoints for a thread and re-invoke the graph from any of them. This is what LangGraph Studio uses for time travel debugging.

How do you wire the Postgres checkpointer for production?

# filename: app/graph.py
# description: A LangGraph agent with persistent state in postgres.
# Each request resumes from the latest checkpoint for its thread.
from contextlib import asynccontextmanager
from fastapi import FastAPI
from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
from langgraph.graph import StateGraph, END
from app.config import get_settings

settings = get_settings()


def build_graph(checkpointer):
    graph = StateGraph(dict)
    graph.add_node('plan', plan_node)
    graph.add_node('act', act_node)
    graph.set_entry_point('plan')
    graph.add_conditional_edges('plan', should_continue, {'act': 'act', 'end': END})
    graph.add_edge('act', 'plan')
    return graph.compile(checkpointer=checkpointer)


@asynccontextmanager
async def lifespan(app: FastAPI):
    async with AsyncPostgresSaver.from_conn_string(settings.database_url) as saver:
        await saver.setup()  # creates the checkpoint tables on first run
        app.state.agent = build_graph(saver)
        yield


app = FastAPI(lifespan=lifespan)

How do you use threads in a real request?

# filename: app/routes/chat.py
# description: Each request loads the agent's prior state by thread_id
# and writes the new state back when the graph finishes.
from fastapi import APIRouter, Request
from uuid import uuid4

router = APIRouter()


@router.post('/chat')
async def chat(payload: dict, request: Request):
    agent = request.app.state.agent
    thread_id = payload.get('thread_id') or str(uuid4())

    config = {'configurable': {'thread_id': thread_id}}
    result = await agent.ainvoke(
        {'messages': [{'role': 'user', 'content': payload['message']}]},
        config=config,
    )

    return {'thread_id': thread_id, 'reply': result['messages'][-1].content}

Why do you need a thread model on top of the checkpointer?

This is the part the LangGraph tutorials skip and the part every team eventually rebuilds. A thin Thread model with these fields covers most needs:

Field	Why it exists
`id` (UUID, PK)	Used as the LangGraph `thread_id`
`user_id` (FK)	So a user can list only their own threads
`title`	Generated from the first user message for display
`created_at`	Sort order in the UI; cleanup of stale threads
`last_message_at`	For "recently active" sorting
`archived` (bool)	Soft delete without losing checkpoints

What to do Monday morning

Audit every place in your code that calls graph.invoke or graph.ainvoke. If any of them omit config={'configurable': {'thread_id': ...}}, fix it. That is your "agent forgets everything" bug.
Replace MemorySaver with AsyncPostgresSaver. Move the initialization into your FastAPI lifespan. Run await saver.setup() once, then ship.
Create a threads table in your application database with the columns above. Generate the thread_id there, not in LangGraph. Pass it through.
Test the failure mode you actually fear: kill your container mid-conversation, bring it back up, send another message with the same thread_id, and confirm the agent picks up where it left off. If it does not, your checkpointer is misconfigured.
Add a cleanup job that deletes checkpoints older than your retention window. The checkpoint table grows fast for chatty agents and nobody cleans it up by default.

The default in-memory checkpointer cannot survive multi-replica deploys, restarts, or crashes. Move to AsyncPostgresSaver before your first production user.
A thread_id is the unit of persistence. Every graph.invoke call must pass one in the config or the agent has no memory.
The checkpointer stores blob state, not metadata. Keep your own threads table for title, owner, and archive status, and use its primary key as the LangGraph thread_id.
Initialize the checkpointer inside the FastAPI lifespan and call saver.setup() on startup. The async with block is mandatory.
Add a checkpoint cleanup job from day one. The table grows by one row per node per message and nothing prunes it for you.
To see persistence wired into a complete agent stack with retrieval and tool use, walk through the Agentic RAG Masterclass or start with the AI Agents Fundamentals primer.

LangGraph persistence: thread models for production agents

Share this post

Share this post

Continue Reading

Which language should you build Redis in? Lessons from rebuilding it 6 times

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Weekly Bytes of AI

Ready to go deeper?

LangGraph persistence: thread models for production agents

Share this post

Share this post

Continue Reading

Which language should you build Redis in? Lessons from rebuilding it 6 times

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Weekly Bytes of AI

Ready to go deeper?