Async context management for Python AI services

Your service leaks database connections until it dies at 2am

Your FastAPI agent service runs fine for hours. Then it starts returning 500s. Logs say QueuePool limit of size 20 overflow 10 reached. You restart and it works again for a few hours. The problem is not the pool size. The problem is that somewhere in your code, an async function is acquiring a connection and not releasing it when an exception fires, or when a request is cancelled mid-flight.

The fix is async context managers everywhere that matters. async with guarantees cleanup even on exception, cancellation, and timeout. The pattern is 3 lines different from try/finally and one line different from sync with, but getting the details right is the difference between a service that leaks connections and one that does not.

This post is the async context management pattern for Python AI services: FastAPI lifespan for startup, async context managers for per-request resources, contextvars for request-scoped state, and the 3 bugs I see every agent project ship.

Why do sync patterns fail in async Python?

Because try/finally in a sync function looks identical to try/finally in an async function, but behaves differently under cancellation. An async task can be cancelled at any await point, which means the finally block might run in an unexpected order or skip entirely if you are not using a context manager.

3 specific failure modes:

Unclosed connections on exception. Code path acquires a connection, raises mid-processing, and the connection leaks because the cleanup was not in a context manager.
Leak on cancellation. Client disconnects mid-request. FastAPI cancels the task. Cleanup code in a finally block runs but the await it contains never completes.
Double-release. Cleanup code runs twice because of a bug in the retry loop, releasing a connection that was already released.

Async context managers (async with) handle all 3 correctly because the cleanup path is wired into the protocol at the Python level, not bolted on via try/finally.

graph TD
    Request[Request arrives] --> Enter[__aenter__: acquire resource]
    Enter --> Body[handler runs await calls]
    Body -->|success| Exit[__aexit__: release resource]
    Body -->|exception| ExitErr[__aexit__: release + re-raise]
    Body -->|cancellation| ExitCancel[__aexit__: release]
    Exit --> Response[Response]
    ExitErr --> Error[500 returned]
    ExitCancel --> Close[Connection closed]

    style Enter fill:#dbeafe,stroke:#1e40af
    style Exit fill:#dcfce7,stroke:#15803d

What is the FastAPI lifespan pattern?

lifespan is an async context manager that runs once at app startup and once at shutdown. It is the canonical place to initialize resources that live for the process lifetime: LLM clients, database engines, Redis pools, vector store clients.

# filename: app/main.py
# description: FastAPI lifespan initializes long-lived resources.
from contextlib import asynccontextmanager
from fastapi import FastAPI
from sqlalchemy.ext.asyncio import create_async_engine, async_sessionmaker
from redis.asyncio import Redis
from anthropic import AsyncAnthropic


@asynccontextmanager
async def lifespan(app: FastAPI):
    # Startup: acquire resources
    app.state.engine = create_async_engine(
        'postgresql+asyncpg://...',
        pool_size=20,
        pool_pre_ping=True,
    )
    app.state.session_factory = async_sessionmaker(app.state.engine)
    app.state.redis = await Redis.from_url('redis://localhost:6379')
    app.state.llm = AsyncAnthropic()

    yield  # App runs here

    # Shutdown: release resources in reverse order
    await app.state.redis.aclose()
    await app.state.engine.dispose()

3 details that matter. The yield is the entire app runtime; anything before is startup, anything after is shutdown. Resources are released in reverse order of acquisition. app.state is the recommended attribute for storing instances that routes can access via request.app.state.

For the broader FastAPI production setup, see the FastAPI and Uvicorn for production agentic AI systems post.

How do you manage per-request resources?

With async context managers in the request handler or via Depends. Never with bare try/finally.

# filename: app/routes/chat.py
# description: Per-request database session managed via async with.
from fastapi import APIRouter, Request
from sqlalchemy.ext.asyncio import AsyncSession

router = APIRouter()


@router.post('/chat')
async def chat(request: Request, body: dict):
    async with request.app.state.session_factory() as session:
        async with session.begin():
            # Everything inside this block uses the session.
            # On exception or cancellation, session.close() runs automatically.
            result = await run_chat_turn(session, body)
            return result

async with session.begin() wraps the body in a transaction. On success, commit. On exception, rollback. Connection returns to the pool cleanly regardless. No try/finally, no manual session.close().

What are contextvars and when do you need them?

contextvars is Python's built-in mechanism for request-scoped state in async code. Think of it as thread-local storage for asyncio tasks. Each request gets its own "context", and variables set via contextvars are isolated from other concurrent requests.

# filename: app/context.py
# description: Request context using contextvars for logging + tracing.
from contextvars import ContextVar
from uuid import uuid4

request_id: ContextVar[str] = ContextVar('request_id', default='')
user_id: ContextVar[str] = ContextVar('user_id', default='')


async def set_request_context(rid: str, uid: str):
    request_id.set(rid)
    user_id.set(uid)


def get_request_id() -> str:
    return request_id.get()

Use contextvars when you need to propagate state (request ID, user ID, trace ID) through deep call stacks without threading it through every function argument. Logging, observability, and structured tracing are the canonical use cases.

For the broader observability pattern, see the Langfuse integration for agentic AI tracing post.

How do you wire contextvars through FastAPI middleware?

Middleware sets the context at request entry, handlers read it anywhere, background tasks inherit it.

# filename: app/middleware.py
# description: Middleware sets request ID into contextvars for every request.
from uuid import uuid4
from fastapi import Request
from app.context import request_id


async def request_id_middleware(request: Request, call_next):
    rid = request.headers.get('x-request-id') or str(uuid4())
    token = request_id.set(rid)
    try:
        response = await call_next(request)
        response.headers['x-request-id'] = rid
        return response
    finally:
        request_id.reset(token)

The token and reset pattern ensures the context is cleaned up after the request, even if an exception fires inside call_next. This is the canonical way to use contextvars with middleware.

What are the 3 bugs every agent project ships?

Bare await on a resource that needs a context manager. Someone writes session = await factory() then forgets to call session.close(). Leak.
Catching CancelledError and swallowing it. Async Python relies on CancelledError propagating to unwind the stack. Catching it and returning a value means cleanup code in parent context managers never runs.
Creating a new event loop inside a request. Calling asyncio.new_event_loop() or asyncio.run() inside a FastAPI handler is a hard error in Python 3.10+. The symptom: cryptic "attached to a different loop" errors. The fix: only use the already-running loop.

For the full production setup with connection pooling that this pattern leans on, see the Connection pooling in production Python AI services post.

What to do Monday morning

Audit your startup code. Everything that lives for the process lifetime (LLM client, DB engine, Redis, vector store) belongs in a lifespan block, not at module import time.
Find every try/finally in your async code that releases a resource. Replace each with an async with context manager.
Add a request_id_middleware that sets a request ID into contextvars at the start of every request. Use it in your logger formatter and observability spans.
Grep for asyncio.new_event_loop and asyncio.run inside route handlers. Both are bugs. Remove them.
Run a load test. Watch the database connection count in Postgres during and after the test. If it does not return to baseline after traffic stops, you still have a leak somewhere.

The headline: async context management is lifespan + async with + contextvars. 3 patterns, zero leaks, clean teardown on cancel or exception. The pattern is 30 minutes to migrate; the leak you fix is worth hours of production toil.

Frequently asked questions

Why is async context management different from sync context management?

Because async code can be cancelled at any await point, and sync try/finally patterns do not handle cancellation correctly. async with wires the cleanup path into the Python protocol at the language level, guaranteeing cleanup even on cancellation, exception, and timeout. Sync try/finally in an async function is a subtle bug waiting to happen.

What does FastAPI's lifespan actually do?

lifespan is an async context manager that runs once when the FastAPI app starts and once when it shuts down. Everything before yield is startup (initialize databases, LLM clients, caches). Everything after yield is shutdown (release resources in reverse order). It replaces the older @app.on_event('startup') decorator which is now deprecated.

When should I use contextvars vs function arguments for request state?

Use function arguments for state that only 1-2 functions deep need to see. Use contextvars for cross-cutting state that needs to propagate through deep call stacks: request ID, user ID, trace ID, tenant ID. The canonical use case is logging, where every log line should include the request ID without threading it through every function.

How do contextvars interact with FastAPI middleware?

Middleware sets the contextvar at request entry using var.set(value), which returns a token. In a finally block, call var.reset(token) to restore the previous value. This pattern is safe under concurrent requests because each async task has its own context, and the reset ensures cleanup even on exception.

What is the most common async bug in Python AI services?

Catching CancelledError and swallowing it. When a client disconnects, FastAPI raises CancelledError in the request handler to unwind the task. If your code catches the exception and returns a value, cleanup code in parent context managers never runs, and you leak connections. The fix: never catch CancelledError unless you re-raise it after cleanup.

Key takeaways

Sync try/finally does not handle async cancellation correctly. Use async with everywhere a resource needs cleanup.
FastAPI lifespan is the right place to initialize long-lived resources. Everything before yield is startup, everything after is shutdown.
Per-request resources (database sessions, transactions) belong in async with blocks inside the route handler or a Depends. Not in module globals.
contextvars propagate request-scoped state (request ID, user ID) through deep call stacks. Set in middleware, read anywhere, reset on exit.
Never catch CancelledError without re-raising. It is how async Python unwinds tasks cleanly.
To see async context management wired into a full production agent stack with pools, sessions, and observability, walk through the Build your own coding agent course, or start with the AI Agents Fundamentals primer.

For the full Python asyncio documentation on context managers, cancellation, and contextvars, see the asyncio docs. The cancellation semantics explained there are worth reading before writing any serious async code.

Your service leaks database connections until it dies at 2am

Why do sync patterns fail in async Python?

3 specific failure modes:

Unclosed connections on exception. Code path acquires a connection, raises mid-processing, and the connection leaks because the cleanup was not in a context manager.
Leak on cancellation. Client disconnects mid-request. FastAPI cancels the task. Cleanup code in a finally block runs but the await it contains never completes.
Double-release. Cleanup code runs twice because of a bug in the retry loop, releasing a connection that was already released.

Async context managers (async with) handle all 3 correctly because the cleanup path is wired into the protocol at the Python level, not bolted on via try/finally.

graph TD
    Request[Request arrives] --> Enter[__aenter__: acquire resource]
    Enter --> Body[handler runs await calls]
    Body -->|success| Exit[__aexit__: release resource]
    Body -->|exception| ExitErr[__aexit__: release + re-raise]
    Body -->|cancellation| ExitCancel[__aexit__: release]
    Exit --> Response[Response]
    ExitErr --> Error[500 returned]
    ExitCancel --> Close[Connection closed]

    style Enter fill:#dbeafe,stroke:#1e40af
    style Exit fill:#dcfce7,stroke:#15803d

What is the FastAPI lifespan pattern?

# filename: app/main.py
# description: FastAPI lifespan initializes long-lived resources.
from contextlib import asynccontextmanager
from fastapi import FastAPI
from sqlalchemy.ext.asyncio import create_async_engine, async_sessionmaker
from redis.asyncio import Redis
from anthropic import AsyncAnthropic


@asynccontextmanager
async def lifespan(app: FastAPI):
    # Startup: acquire resources
    app.state.engine = create_async_engine(
        'postgresql+asyncpg://...',
        pool_size=20,
        pool_pre_ping=True,
    )
    app.state.session_factory = async_sessionmaker(app.state.engine)
    app.state.redis = await Redis.from_url('redis://localhost:6379')
    app.state.llm = AsyncAnthropic()

    yield  # App runs here

    # Shutdown: release resources in reverse order
    await app.state.redis.aclose()
    await app.state.engine.dispose()

For the broader FastAPI production setup, see the FastAPI and Uvicorn for production agentic AI systems post.

How do you manage per-request resources?

With async context managers in the request handler or via Depends. Never with bare try/finally.

# filename: app/routes/chat.py
# description: Per-request database session managed via async with.
from fastapi import APIRouter, Request
from sqlalchemy.ext.asyncio import AsyncSession

router = APIRouter()


@router.post('/chat')
async def chat(request: Request, body: dict):
    async with request.app.state.session_factory() as session:
        async with session.begin():
            # Everything inside this block uses the session.
            # On exception or cancellation, session.close() runs automatically.
            result = await run_chat_turn(session, body)
            return result

What are contextvars and when do you need them?

# filename: app/context.py
# description: Request context using contextvars for logging + tracing.
from contextvars import ContextVar
from uuid import uuid4

request_id: ContextVar[str] = ContextVar('request_id', default='')
user_id: ContextVar[str] = ContextVar('user_id', default='')


async def set_request_context(rid: str, uid: str):
    request_id.set(rid)
    user_id.set(uid)


def get_request_id() -> str:
    return request_id.get()

For the broader observability pattern, see the Langfuse integration for agentic AI tracing post.

How do you wire contextvars through FastAPI middleware?

Middleware sets the context at request entry, handlers read it anywhere, background tasks inherit it.

# filename: app/middleware.py
# description: Middleware sets request ID into contextvars for every request.
from uuid import uuid4
from fastapi import Request
from app.context import request_id


async def request_id_middleware(request: Request, call_next):
    rid = request.headers.get('x-request-id') or str(uuid4())
    token = request_id.set(rid)
    try:
        response = await call_next(request)
        response.headers['x-request-id'] = rid
        return response
    finally:
        request_id.reset(token)

The token and reset pattern ensures the context is cleaned up after the request, even if an exception fires inside call_next. This is the canonical way to use contextvars with middleware.

What are the 3 bugs every agent project ships?

Bare await on a resource that needs a context manager. Someone writes session = await factory() then forgets to call session.close(). Leak.
Catching CancelledError and swallowing it. Async Python relies on CancelledError propagating to unwind the stack. Catching it and returning a value means cleanup code in parent context managers never runs.
Creating a new event loop inside a request. Calling asyncio.new_event_loop() or asyncio.run() inside a FastAPI handler is a hard error in Python 3.10+. The symptom: cryptic "attached to a different loop" errors. The fix: only use the already-running loop.

For the full production setup with connection pooling that this pattern leans on, see the Connection pooling in production Python AI services post.

What to do Monday morning

Audit your startup code. Everything that lives for the process lifetime (LLM client, DB engine, Redis, vector store) belongs in a lifespan block, not at module import time.
Find every try/finally in your async code that releases a resource. Replace each with an async with context manager.
Add a request_id_middleware that sets a request ID into contextvars at the start of every request. Use it in your logger formatter and observability spans.
Grep for asyncio.new_event_loop and asyncio.run inside route handlers. Both are bugs. Remove them.
Run a load test. Watch the database connection count in Postgres during and after the test. If it does not return to baseline after traffic stops, you still have a leak somewhere.

Sync try/finally does not handle async cancellation correctly. Use async with everywhere a resource needs cleanup.
FastAPI lifespan is the right place to initialize long-lived resources. Everything before yield is startup, everything after is shutdown.
Per-request resources (database sessions, transactions) belong in async with blocks inside the route handler or a Depends. Not in module globals.
contextvars propagate request-scoped state (request ID, user ID) through deep call stacks. Set in middleware, read anywhere, reset on exit.
Never catch CancelledError without re-raising. It is how async Python unwinds tasks cleanly.
To see async context management wired into a full production agent stack with pools, sessions, and observability, walk through the Build your own coding agent course, or start with the AI Agents Fundamentals primer.

Async context management in Python AI services

Share this post

Share this post

Continue Reading

Which language should you build Redis in? Lessons from rebuilding it 6 times

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Weekly Bytes of AI

Ready to go deeper?

Async context management in Python AI services

Share this post

Share this post

Continue Reading

Which language should you build Redis in? Lessons from rebuilding it 6 times

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Weekly Bytes of AI

Ready to go deeper?