FastAPI + LangGraph: production chatbot API

Your LangGraph demo works in a notebook and breaks behind FastAPI

You prototyped your chatbot in a Jupyter notebook with LangGraph. State flows through the graph, the checkpointer saves to SQLite, it feels like magic. You wrap it in a FastAPI /chat route, deploy, and 3 things immediately break: every request shares the same thread, there is no streaming, and a deploy restart kills every in-flight conversation.

The fix is not "use LangGraph harder". The fix is to bridge LangGraph's thread model to FastAPI's request model with 3 specific decisions: per-request thread IDs from the user, streaming via Server-Sent Events, and a Postgres checkpointer that survives restarts.

This post is the FastAPI + LangGraph production pattern I ship on every chatbot: thread-per-user routing, SSE streaming, durable checkpointing, and the error recovery that makes it a real service.

Why does the naive LangGraph + FastAPI wiring fail?

Because LangGraph's default checkpointer is in-memory and its thread ID defaults to a single value if you don't pass one. Under concurrent FastAPI traffic, every request writes to the same thread and every user sees everyone else's messages. Plus:

No streaming. Default graph.invoke returns a single dict. Users wait 5-8 seconds with no visible progress. They think the service is dead.
Ephemeral state. MemorySaver dies with the process. Every deploy wipes every conversation.
No per-user isolation. Without a thread ID per user+session, LangGraph treats all traffic as one long conversation.

The production pattern fixes all 3 by passing the user's session ID as the thread ID, swapping the checkpointer for Postgres, and using graph.astream to push tokens down a Server-Sent Events stream.

graph LR
    User[User request] --> Route[FastAPI /chat]
    Route --> ThreadID[session_id → thread_id]
    ThreadID --> Graph[LangGraph astream]
    Graph --> Pg[(Postgres checkpointer)]
    Graph --> Tokens[Token stream]
    Tokens --> SSE[SSE response]
    SSE --> User

    style Graph fill:#dbeafe,stroke:#1e40af
    style Pg fill:#dcfce7,stroke:#15803d

How do you wire LangGraph into a FastAPI route?

3 moves: compile the graph once at startup, route the request's session ID as the thread ID, stream the response.

# filename: app/main.py
# description: FastAPI + LangGraph production chatbot API.
# Compile graph at startup, persist to Postgres, stream via SSE.
from contextlib import asynccontextmanager
from fastapi import FastAPI, Depends
from fastapi.responses import StreamingResponse
from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
from app.domain.graph import build_graph
from app.auth import AuthContext, get_auth


@asynccontextmanager
async def lifespan(app: FastAPI):
    async with AsyncPostgresSaver.from_conn_string(
        'postgresql://user:pass@localhost/agents'
    ) as checkpointer:
        await checkpointer.setup()
        app.state.graph = build_graph(checkpointer=checkpointer)
        yield


app = FastAPI(lifespan=lifespan)


@app.post('/chat')
async def chat(
    body: dict,
    auth: AuthContext = Depends(get_auth),
):
    thread_id = f'{auth.tenant_id}:{auth.user_id}:{body["session_id"]}'
    config = {'configurable': {'thread_id': thread_id}}
    initial_state = {
        'messages': [{'role': 'user', 'content': body['message']}],
    }

    async def event_stream():
        async for event in app.state.graph.astream(initial_state, config):
            for node_name, node_output in event.items():
                msgs = node_output.get('messages', [])
                if msgs:
                    content = msgs[-1].get('content', '') if isinstance(msgs[-1], dict) else msgs[-1].content
                    yield f'data: {content}\n\n'
        yield 'data: [DONE]\n\n'

    return StreamingResponse(
        event_stream(),
        media_type='text/event-stream',
        headers={'X-Accel-Buffering': 'no', 'Cache-Control': 'no-cache'},
    )

5 decisions in this file matter. The graph is compiled once at startup via lifespan, not per request. The checkpointer is AsyncPostgresSaver, not MemorySaver. Thread ID encodes tenant + user + session to isolate every conversation. graph.astream emits events as the graph runs; we forward them as SSE. X-Accel-Buffering: no disables Nginx buffering so tokens actually stream.

For the broader FastAPI streaming pattern including Nginx gotchas, see the FastAPI and Uvicorn for production agentic AI systems post.

Why is the thread ID composite?

Because LangGraph treats thread_id as the conversation key. If you use just session_id, two users with the same session ID collide. If you use just user_id, every message from the same user goes into one thread. The composite tenant:user:session guarantees isolation while still letting one user have multiple parallel conversations (one session per browser tab).

The same pattern applies to every multi-tenant agent service. For the full multi-tenant schema, see the User and session models for multi-tenant AI agents post.

How does the Postgres checkpointer survive restarts?

AsyncPostgresSaver writes graph state to Postgres tables after every node transition. On the next request for the same thread ID, LangGraph reads the last checkpoint and continues from there. A deploy restart is invisible to users because the thread state is not in the process.

# filename: app/domain/graph.py
# description: LangGraph state graph with a passed-in checkpointer.
from langgraph.graph import StateGraph, END
from typing import TypedDict


class ChatState(TypedDict):
    messages: list[dict]


def llm_node(state: ChatState) -> dict:
    # Call the LLM, append the response
    from app.infra.llm import call_llm
    reply = call_llm(state['messages'])
    return {'messages': state['messages'] + [reply]}


def build_graph(checkpointer):
    builder = StateGraph(ChatState)
    builder.add_node('llm', llm_node)
    builder.set_entry_point('llm')
    builder.add_edge('llm', END)
    return builder.compile(checkpointer=checkpointer)

Pass the checkpointer at compile time. Every invoke or astream with a thread_id now persists to Postgres automatically.

For the LangGraph persistence deep-dive including AsyncPostgresSaver setup, see the LangGraph persistence thread models post.

How do you handle errors without breaking the SSE stream?

Wrap the graph iteration in a try/except inside the generator. Emit an error event on failure instead of raising (which would break the SSE connection).

# filename: app/routes/chat.py
# description: Error-safe SSE generator for LangGraph output.
async def safe_event_stream(graph, initial_state, config):
    try:
        async for event in graph.astream(initial_state, config):
            for node_name, output in event.items():
                msgs = output.get('messages', [])
                if msgs:
                    content = msgs[-1]['content'] if isinstance(msgs[-1], dict) else msgs[-1].content
                    yield f'data: {content}\n\n'
    except Exception as exc:
        yield f'event: error\ndata: {str(exc)}\n\n'
    finally:
        yield 'data: [DONE]\n\n'

The finally block guarantees a [DONE] event even on error, so clients can close the stream cleanly.

What to do Monday morning

If your LangGraph chatbot uses MemorySaver, swap to AsyncPostgresSaver. This single change makes state survive restarts.
Move graph compilation into FastAPI's lifespan block. Compile once per worker, not per request.
Build the thread ID from tenant + user + session. Confirm two users do not share a thread under load.
Replace graph.invoke with graph.astream in the route body. Return a StreamingResponse with text/event-stream media type.
Set X-Accel-Buffering: no on the response. Without it, Nginx buffers the entire response and you get no streaming.
Test the error path: make the LLM client raise on the 3rd request and confirm the SSE stream closes with an error event instead of hanging.

The headline: LangGraph + FastAPI is ready for production when you compose thread IDs from user identity, checkpoint to Postgres, and stream through SSE. The demo graph is the same; the 3 wiring decisions are the difference.

Frequently asked questions

Why use LangGraph with FastAPI instead of a custom agent loop?

LangGraph gives you 3 things for free: a state model that survives restarts via checkpointers, a visual graph you can render and debug, and conditional routing without manual if-branches. FastAPI gives you HTTP, auth, and streaming. Together they cover the agent loop and the service layer. Custom agent code can do the same but takes significantly more work to get production-grade.

How do I isolate conversations between users in LangGraph?

Use a composite thread_id like tenant_id:user_id:session_id. LangGraph treats each unique thread ID as a separate conversation, so tenants and users never collide. The session ID segment lets one user have multiple parallel conversations (browser tabs) without interference.

Why is AsyncPostgresSaver better than MemorySaver?

Because MemorySaver is process-local and dies with the worker. A deploy restart wipes every conversation. AsyncPostgresSaver persists every state transition to Postgres, so restarts are invisible to users. It also lets multiple workers share the same checkpoint store, which is required for any multi-worker deployment.

How do I stream LangGraph output through FastAPI?

Use graph.astream inside an async generator, yield each event as a server-sent event (data: <payload>\n\n), and wrap the generator in a StreamingResponse with media_type='text/event-stream'. Set X-Accel-Buffering: no on the response headers so Nginx does not buffer the entire response before sending.

What happens if the LLM call fails mid-stream?

Wrap the graph iteration in a try/except inside the generator. On failure, yield an error event (event: error\ndata: <msg>\n\n) instead of raising. Put the [DONE] sentinel in a finally block so the client always sees a clean close. This keeps the SSE connection well-behaved under any error.

Key takeaways

LangGraph in a notebook is not production. The 3 missing pieces are per-user thread isolation, durable state, and streaming responses.
Use AsyncPostgresSaver as the checkpointer, not MemorySaver. State survives restarts and multi-worker deployments.
Compose thread IDs from tenant:user:session. This guarantees isolation while supporting multiple parallel conversations per user.
Stream via graph.astream inside an SSE generator. Set X-Accel-Buffering: no so Nginx does not buffer the stream.
Handle errors inside the generator, not by letting them propagate. Yield an error event and always close with [DONE] in a finally block.
To see LangGraph wired into a full production chatbot with auth, observability, and tool calling, walk through the Build your own coding agent course, or start with the AI Agents Fundamentals primer.

For the full LangGraph persistence documentation covering AsyncPostgresSaver configuration, thread IDs, and checkpoint introspection, see the LangGraph persistence docs.

Your LangGraph demo works in a notebook and breaks behind FastAPI

This post is the FastAPI + LangGraph production pattern I ship on every chatbot: thread-per-user routing, SSE streaming, durable checkpointing, and the error recovery that makes it a real service.

Why does the naive LangGraph + FastAPI wiring fail?

No streaming. Default graph.invoke returns a single dict. Users wait 5-8 seconds with no visible progress. They think the service is dead.
Ephemeral state. MemorySaver dies with the process. Every deploy wipes every conversation.
No per-user isolation. Without a thread ID per user+session, LangGraph treats all traffic as one long conversation.

graph LR
    User[User request] --> Route[FastAPI /chat]
    Route --> ThreadID[session_id → thread_id]
    ThreadID --> Graph[LangGraph astream]
    Graph --> Pg[(Postgres checkpointer)]
    Graph --> Tokens[Token stream]
    Tokens --> SSE[SSE response]
    SSE --> User

    style Graph fill:#dbeafe,stroke:#1e40af
    style Pg fill:#dcfce7,stroke:#15803d

How do you wire LangGraph into a FastAPI route?

3 moves: compile the graph once at startup, route the request's session ID as the thread ID, stream the response.

# filename: app/main.py
# description: FastAPI + LangGraph production chatbot API.
# Compile graph at startup, persist to Postgres, stream via SSE.
from contextlib import asynccontextmanager
from fastapi import FastAPI, Depends
from fastapi.responses import StreamingResponse
from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
from app.domain.graph import build_graph
from app.auth import AuthContext, get_auth


@asynccontextmanager
async def lifespan(app: FastAPI):
    async with AsyncPostgresSaver.from_conn_string(
        'postgresql://user:pass@localhost/agents'
    ) as checkpointer:
        await checkpointer.setup()
        app.state.graph = build_graph(checkpointer=checkpointer)
        yield


app = FastAPI(lifespan=lifespan)


@app.post('/chat')
async def chat(
    body: dict,
    auth: AuthContext = Depends(get_auth),
):
    thread_id = f'{auth.tenant_id}:{auth.user_id}:{body["session_id"]}'
    config = {'configurable': {'thread_id': thread_id}}
    initial_state = {
        'messages': [{'role': 'user', 'content': body['message']}],
    }

    async def event_stream():
        async for event in app.state.graph.astream(initial_state, config):
            for node_name, node_output in event.items():
                msgs = node_output.get('messages', [])
                if msgs:
                    content = msgs[-1].get('content', '') if isinstance(msgs[-1], dict) else msgs[-1].content
                    yield f'data: {content}\n\n'
        yield 'data: [DONE]\n\n'

    return StreamingResponse(
        event_stream(),
        media_type='text/event-stream',
        headers={'X-Accel-Buffering': 'no', 'Cache-Control': 'no-cache'},
    )

For the broader FastAPI streaming pattern including Nginx gotchas, see the FastAPI and Uvicorn for production agentic AI systems post.

Why is the thread ID composite?

The same pattern applies to every multi-tenant agent service. For the full multi-tenant schema, see the User and session models for multi-tenant AI agents post.

How does the Postgres checkpointer survive restarts?

# filename: app/domain/graph.py
# description: LangGraph state graph with a passed-in checkpointer.
from langgraph.graph import StateGraph, END
from typing import TypedDict


class ChatState(TypedDict):
    messages: list[dict]


def llm_node(state: ChatState) -> dict:
    # Call the LLM, append the response
    from app.infra.llm import call_llm
    reply = call_llm(state['messages'])
    return {'messages': state['messages'] + [reply]}


def build_graph(checkpointer):
    builder = StateGraph(ChatState)
    builder.add_node('llm', llm_node)
    builder.set_entry_point('llm')
    builder.add_edge('llm', END)
    return builder.compile(checkpointer=checkpointer)

Pass the checkpointer at compile time. Every invoke or astream with a thread_id now persists to Postgres automatically.

For the LangGraph persistence deep-dive including AsyncPostgresSaver setup, see the LangGraph persistence thread models post.

How do you handle errors without breaking the SSE stream?

Wrap the graph iteration in a try/except inside the generator. Emit an error event on failure instead of raising (which would break the SSE connection).

# filename: app/routes/chat.py
# description: Error-safe SSE generator for LangGraph output.
async def safe_event_stream(graph, initial_state, config):
    try:
        async for event in graph.astream(initial_state, config):
            for node_name, output in event.items():
                msgs = output.get('messages', [])
                if msgs:
                    content = msgs[-1]['content'] if isinstance(msgs[-1], dict) else msgs[-1].content
                    yield f'data: {content}\n\n'
    except Exception as exc:
        yield f'event: error\ndata: {str(exc)}\n\n'
    finally:
        yield 'data: [DONE]\n\n'

The finally block guarantees a [DONE] event even on error, so clients can close the stream cleanly.

What to do Monday morning

If your LangGraph chatbot uses MemorySaver, swap to AsyncPostgresSaver. This single change makes state survive restarts.
Move graph compilation into FastAPI's lifespan block. Compile once per worker, not per request.
Build the thread ID from tenant + user + session. Confirm two users do not share a thread under load.
Replace graph.invoke with graph.astream in the route body. Return a StreamingResponse with text/event-stream media type.
Set X-Accel-Buffering: no on the response. Without it, Nginx buffers the entire response and you get no streaming.
Test the error path: make the LLM client raise on the 3rd request and confirm the SSE stream closes with an error event instead of hanging.

LangGraph in a notebook is not production. The 3 missing pieces are per-user thread isolation, durable state, and streaming responses.
Use AsyncPostgresSaver as the checkpointer, not MemorySaver. State survives restarts and multi-worker deployments.
Compose thread IDs from tenant:user:session. This guarantees isolation while supporting multiple parallel conversations per user.
Stream via graph.astream inside an SSE generator. Set X-Accel-Buffering: no so Nginx does not buffer the stream.
Handle errors inside the generator, not by letting them propagate. Yield an error event and always close with [DONE] in a finally block.
To see LangGraph wired into a full production chatbot with auth, observability, and tool calling, walk through the Build your own coding agent course, or start with the AI Agents Fundamentals primer.

For the full LangGraph persistence documentation covering AsyncPostgresSaver configuration, thread IDs, and checkpoint introspection, see the LangGraph persistence docs.

FastAPI + LangGraph: production chatbot API pattern

Share this post

Share this post

Continue Reading

Which language should you build Redis in? Lessons from rebuilding it 6 times

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Weekly Bytes of AI

Ready to go deeper?

FastAPI + LangGraph: production chatbot API pattern

Share this post

Share this post

Continue Reading

Which language should you build Redis in? Lessons from rebuilding it 6 times

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Weekly Bytes of AI

Ready to go deeper?