Your agent restarts and every conversation disappears

A user is mid-conversation with your agent. You ship a deploy. The container restarts. The user sends their next message and the agent has no idea who they are. The "memory" was in a Python dict that only lived for the process lifetime. Every restart is a mass amnesia event.

This is the default failure mode of a dev-built agent. The fix is state persistence in Postgres (or any durable store), written after every turn, loaded at the start of every turn. Same LLM, same tools, same prompts, but the session survives restarts, deploys, and autoscaling events.

This post is the Postgres state persistence pattern: the schema, the session writer, the idempotency rule that prevents duplicate writes under retry, and the refactor from an in-memory dict to a durable store.

Why is an in-memory session dict not enough?

Because the dict lives in the process and dies with it. Every scale-out event, every rolling deploy, every worker crash wipes it. 3 specific failure modes:

  1. Restart amnesia. The container restarts and every active session disappears.
  2. Horizontal split. With multiple workers behind a load balancer, worker A has session X in memory, worker B does not. The user's next request lands on B and the session is empty.
  3. No audit trail. You cannot debug a bad answer from 2 days ago because the conversation was never written down.

Persistent state fixes all 3 by moving the session into Postgres. Every worker reads and writes the same rows. Restarts are invisible. Debugging works.

graph LR
    User[User] --> LB[Load balancer]
    LB --> W1[Worker 1]
    LB --> W2[Worker 2]
    LB --> W3[Worker 3]

    W1 --> DB[(Postgres<br/>sessions + messages)]
    W2 --> DB
    W3 --> DB

    style DB fill:#dcfce7,stroke:#15803d

What schema do you need?

2 tables: sessions and messages. Keep them simple.

# filename: app/infra/models.py
# description: Minimal SQLModel schema for agent state persistence.
from datetime import datetime
from typing import Optional
from sqlmodel import SQLModel, Field


class Session(SQLModel, table=True):
    id: str = Field(primary_key=True)  # UUID from the client
    user_id: str = Field(index=True)
    tenant_id: int = Field(index=True)
    created_at: datetime = Field(default_factory=datetime.utcnow)
    last_activity: datetime = Field(default_factory=datetime.utcnow, index=True)
    status: str = Field(default='active')  # active, archived


class Message(SQLModel, table=True):
    id: Optional[int] = Field(default=None, primary_key=True)
    session_id: str = Field(foreign_key='session.id', index=True)
    role: str  # user, assistant, tool
    content: str
    tool_calls: Optional[dict] = Field(default=None, sa_type=None)
    created_at: datetime = Field(default_factory=datetime.utcnow)
    idempotency_key: Optional[str] = Field(default=None, unique=True, index=True)

3 design decisions. Session ID is a client-supplied UUID, not a DB sequence, so retries on the same logical turn resolve to the same session without a round trip. Messages are append-only. The idempotency_key prevents duplicate writes on retry (a common bug when the LLM call succeeds but the DB write times out and the client retries).

For the multi-tenant guard that pairs with this schema, see the User and session models for multi-tenant AI agents post.

How do you write a session loader and writer?

Both are small. The loader reads the session + last N messages. The writer appends a new message with an idempotency key.

# filename: app/infra/session_store.py
# description: Postgres-backed session store for an agent service.
import hashlib
import uuid
from datetime import datetime
from sqlmodel import Session as DbSession, select
from app.infra.models import Session, Message


class PostgresSessionStore:
    def __init__(self, db: DbSession):
        self.db = db

    async def load(self, session_id: str, limit: int = 40) -> list[dict]:
        stmt = (
            select(Message)
            .where(Message.session_id == session_id)
            .order_by(Message.created_at.desc())
            .limit(limit)
        )
        rows = list(self.db.exec(stmt))
        rows.reverse()  # chronological
        return [{'role': r.role, 'content': r.content} for r in rows]

    async def append(self, session_id: str, role: str, content: str, idem_key: str | None = None) -> None:
        key = idem_key or hashlib.sha256(
            f'{session_id}:{role}:{content}:{datetime.utcnow().isoformat()}'.encode()
        ).hexdigest()
        # Check if already written (idempotency)
        existing = self.db.exec(
            select(Message).where(Message.idempotency_key == key)
        ).first()
        if existing:
            return  # already wrote this turn
        msg = Message(
            session_id=session_id,
            role=role,
            content=content,
            idempotency_key=key,
        )
        self.db.add(msg)
        self.db.commit()

The idempotency check is the load-bearing bit. Without it, a retry that succeeds on the second try will write the same message twice and corrupt the history.

What is the idempotency rule?

Every write that might be retried needs an idempotency key. The key is deterministic: a hash of (session ID + role + content + turn number). Before writing, check if a row with that key already exists. If yes, skip. If no, write.

This means:

  • Retry is safe. Second attempt sees the row from the first attempt and skips.
  • Concurrent writes are safe. The unique index on idempotency_key catches double-writes at the database level.
  • Debugging is easy. You can search by key to see exactly which turn a given attempt landed on.

Without idempotency, the common failure sequence is: LLM call succeeds → DB write times out → client retries → LLM call succeeds again → DB write succeeds → same message is written twice with slightly different timestamps. The conversation history is now corrupted.

How do you wire the store into the agent loop?

The service function takes the store as a dependency (same pattern as the Service layer for AI agents post):

# filename: app/services/chat.py
# description: Agent chat service using a durable session store.
from app.infra.session_store import PostgresSessionStore
from app.infra.llm import call_llm


async def run_chat_turn(session_id: str, user_message: str, store: PostgresSessionStore) -> str:
    history = await store.load(session_id)
    await store.append(session_id, 'user', user_message)

    messages_for_llm = history + [{'role': 'user', 'content': user_message}]
    answer = await call_llm(messages_for_llm)

    await store.append(session_id, 'assistant', answer)
    return answer

Load before the LLM call, write after. If the LLM call fails, the user message is already saved, so the retry can skip it. If the DB write fails, the next request loads without the latest message and the turn is replayed transparently.

For the FastAPI lifespan pattern that instantiates the store once per worker, see the FastAPI and Uvicorn for production agentic AI systems post.

How do you prune old sessions?

Add a background job that archives sessions with last_activity < 30 days ago. Move them to a separate archive table or drop them entirely based on retention policy. Run the job daily off-peak.

-- Archive sessions older than 30 days
UPDATE session
SET status = 'archived'
WHERE last_activity < NOW() - INTERVAL '30 days'
  AND status = 'active';

This keeps the hot session table fast without losing data for compliance or auditing. For GDPR or similar compliance, add a separate deletion job with a longer window.

For the connection pooling setup that the store relies on, see the Connection pooling in production Python AI services post.

What to do Monday morning

  1. If your agent state lives in a Python dict, plan the migration. Every restart until you fix this is a user-visible bug.
  2. Add the 2 tables (sessions, messages) to your Postgres schema. Use SQLModel or plain SQLAlchemy.
  3. Write the store class with load and append. Add the idempotency key check to append.
  4. Refactor the agent loop to read/write via the store instead of a dict. The loop logic does not change, only the storage call.
  5. Deploy. Restart the service. Continue a conversation that was in flight. Confirm the agent remembers the user's last message.
  6. Add the daily archive job for sessions older than 30 days.

The headline: state persistence is Postgres plus a 40-line store plus an idempotency key. Every restart becomes invisible to users. Debugging improves by orders of magnitude.

Frequently asked questions

Why does an agent need state persistence at all?

Because agent sessions span multiple HTTP turns and users expect the conversation to survive restarts and deploys. An in-memory dict loses everything on restart, which is a user-visible bug. Persisting to Postgres (or any durable store) makes sessions survive restarts, rolling deploys, autoscaling events, and worker crashes.

Why use Postgres instead of Redis for agent state?

Because Postgres is transactional, queryable, and gives you a permanent audit trail. Redis is fine for short-lived caches, but agent sessions are an auditable resource that compliance teams may need to inspect months later. Postgres also supports rich queries (find all sessions for user X in the last week), which Redis does not do well.

What is an idempotency key and why does a writer need one?

An idempotency key is a deterministic hash of the write contents that lets you detect duplicate writes. Without it, a retry after a timeout can write the same message twice, corrupting the conversation. With it, the second write sees the row from the first attempt and skips. Every write that might be retried needs one; the unique index on the key enforces it at the database level.

How do you handle concurrent writes from multiple workers?

The unique index on the idempotency_key column catches concurrent double-writes at the database level. Beyond that, use row-level locks (SELECT ... FOR UPDATE) if two workers might modify the same session, though in practice most agent sessions only have one active worker at a time because the load balancer pins the session.

How do you prune old sessions without losing audit data?

Add a background job that marks sessions older than 30 days as archived instead of deleting them. The archive table is read-only and cold. For GDPR compliance, add a separate deletion job on a longer window (90-365 days depending on policy). Keep the daily archive job off-peak so it does not compete with user traffic.

Key takeaways

  1. In-memory session dicts die with the process. Every restart is a mass amnesia event.
  2. Persist to Postgres with 2 tables: sessions and append-only messages. Session ID is a client-supplied UUID.
  3. Idempotency keys on every write prevent double-writes under retry. The unique index enforces it at the database level.
  4. Load before the LLM call, write after. If either step fails, the next request transparently replays the turn.
  5. Archive sessions older than 30 days with a daily background job. Keep the hot table fast without losing audit data.
  6. To see state persistence wired into a full production agent stack with auth, observability, and tools, walk through the Build your own coding agent course, or start with the AI Agents Fundamentals primer.

For the full SQLModel documentation on foreign keys, unique constraints, and relationship loading, see the SQLModel docs. The idempotency pattern works with plain SQLAlchemy too; see the SQLAlchemy unique constraint guide for the underlying primitives.

Share this post

Continue Reading

Weekly Bytes of AI

Technical deep-dives for engineers building production AI systems.

Architecture patterns, system design, cost optimization, and real-world case studies. No fluff, just engineering insights.

Unsubscribe anytime. We respect your inbox.

Ready to go deeper?

Go beyond articles. Build production AI systems with hands-on workshops and our intensive AI Bootcamp.