Your LLM client is instantiated at module import time and you wonder why deploys are flaky

Your agent service has client = AsyncAnthropic() at the top of llm.py. It worked on day 1. Now, months later, some deploys fail with "event loop is closed" errors, shutdown hangs in CI because the HTTP connection pool never drains, and debug logs show the same client instance shared across all 4 Uvicorn workers in ways you cannot explain.

The fix is FastAPI's lifespan context manager. Every process-scoped resource (LLM client, database engine, Redis, vector store) belongs in lifespan, not at module import time. lifespan runs once per worker at startup, gives you a clean place to teardown on shutdown, and eliminates the module-global gotchas entirely.

This post is the FastAPI lifespan pattern for agentic services: why module-level initialization is wrong, what belongs in lifespan, how ordered teardown prevents shutdown hangs, and the 3 bugs this pattern kills instantly.

Why is module-level initialization broken for async services?

Because module imports run before the event loop exists, which means any async code at module level either fails outright or creates a resource attached to a loop that is about to be replaced. 4 specific failure modes:

  1. Event-loop-mismatch errors. You instantiate an AsyncAnthropic client at module level. Python creates it on the current loop. FastAPI then creates a new loop for the worker. Every subsequent request hits "attached to a different loop."

  2. Shutdown hangs. Connections opened at module import time are never explicitly closed. On shutdown, the process hangs until the OS force-kills the TCP socket. CI times out. Deploys look "slow" when they are actually broken.

  3. Worker sharing bugs. Under fork (Gunicorn/Uvicorn with --workers > 1), module globals are copied into each worker but file descriptors are shared. Two workers can race on the same socket and corrupt responses.

  4. Test pollution. Module-level clients cannot be swapped in tests without monkey-patching the whole module, which is fragile and slow.

lifespan fixes all 4 by moving initialization into an explicit async function that runs per-worker, after the event loop is set up, with a guaranteed teardown path.

graph TD
    Start[Worker starts] --> Lifespan[lifespan startup]
    Lifespan --> LLM[init LLM client]
    Lifespan --> DB[init DB engine]
    Lifespan --> Redis[init Redis]
    Lifespan --> Yield[yield: app runs]
    Yield --> Stop[Worker shuts down]
    Stop --> Close1[close Redis]
    Close1 --> Close2[dispose DB engine]
    Close2 --> Close3[close LLM client]
    Close3 --> Done[Clean exit]

    style Lifespan fill:#dcfce7,stroke:#15803d
    style Done fill:#dbeafe,stroke:#1e40af

What does lifespan actually do?

lifespan is an async context manager you pass to FastAPI(lifespan=...). Everything before the yield runs at startup. Everything after the yield runs at shutdown. Between the two, the app is serving traffic.

# filename: app/main.py
# description: FastAPI lifespan initializes per-worker resources.
from contextlib import asynccontextmanager
from fastapi import FastAPI
from anthropic import AsyncAnthropic
from sqlalchemy.ext.asyncio import create_async_engine, async_sessionmaker
from redis.asyncio import Redis
from app.config import get_settings


@asynccontextmanager
async def lifespan(app: FastAPI):
    # ---- Startup ----
    settings = get_settings()

    app.state.llm = AsyncAnthropic(api_key=settings.anthropic_api_key.get_secret_value())
    app.state.db_engine = create_async_engine(
        str(settings.database_url),
        pool_size=20,
        pool_pre_ping=True,
        pool_recycle=1800,
    )
    app.state.session_factory = async_sessionmaker(app.state.db_engine, expire_on_commit=False)
    app.state.redis = await Redis.from_url(str(settings.redis_url))

    yield

    # ---- Shutdown ----
    await app.state.redis.aclose()
    await app.state.db_engine.dispose()
    await app.state.llm.close()


app = FastAPI(lifespan=lifespan)

The teardown is in reverse order of acquisition. This matters: Redis might be used by the LLM wrapper, so Redis closes first. The LLM client might hold references to HTTP pools that need to drain before the DB engine disposes, so LLM closes last. Wrong order = shutdown hangs.

For the broader FastAPI + Uvicorn production stack, see the FastAPI and Uvicorn for production agentic AI systems post.

How do routes access lifespan resources?

Through request.app.state or via a Depends helper. app.state is the FastAPI convention for storing instances that should be accessible from any request.

# filename: app/routes/chat.py
# description: Routes access lifespan resources via Depends or request.app.state.
from fastapi import APIRouter, Request, Depends
from anthropic import AsyncAnthropic


router = APIRouter()


async def get_llm(request: Request) -> AsyncAnthropic:
    return request.app.state.llm


@router.post('/chat')
async def chat(body: dict, llm: AsyncAnthropic = Depends(get_llm)):
    reply = await llm.messages.create(
        model='claude-sonnet-4-6',
        max_tokens=1024,
        messages=[{'role': 'user', 'content': body['message']}],
    )
    return {'answer': reply.content[0].text}

The get_llm dependency is a 1-line helper. Tests can override it with a fake. Route handlers take a typed LLM client as a parameter without ever touching a module-level global.

For the dependency injection pattern that this fits into, see the FastAPI dependency injection for agentic API auth post.

Why does teardown order matter so much?

Because teardown in the wrong order leads to shutdown hangs that look identical to healthy deploys in dashboards but cause real problems: SIGKILL on every deploy, lost in-flight requests, and CI timeouts.

The rule: teardown in reverse order of acquisition. If A depends on B, A closes first. Concretely:

  1. Cache layers close first. Redis, in-memory caches. They may be used by everything else.
  2. Application-level clients close next. LLM clients, HTTP clients, vector store clients.
  3. Database pools close last. Because the LLM client might make a final DB write during its own teardown.
  4. Telemetry flushes at the very end. Langfuse, Sentry, OpenTelemetry should be last so they capture errors from the earlier teardown steps.

What are the 3 bugs lifespan kills?

  1. "Event loop is closed" errors. Because resources are created on the running event loop, not on a loop that module import created and then discarded.

  2. Shutdown hangs. Because every resource has an explicit close in the shutdown block, sockets drain cleanly and the process exits in under a second instead of waiting for SIGKILL.

  3. Test pollution. Because resources are in app.state and not in module globals, tests can override them with app.state.llm = FakeLLM() or through dependency overrides without monkey-patching any module.

For the full production stack context including Gunicorn + Uvicorn worker configuration, see the FastAPI and Uvicorn post.

What to do Monday morning

  1. Grep your codebase for module-level client = Anthropic() or engine = create_engine(...). Every hit is a candidate for lifespan migration.
  2. Create the lifespan function. Initialize each resource inside it. Attach each to app.state.
  3. Add the teardown code after yield. Reverse order of acquisition. No bare try/finally.
  4. Update route handlers to read from request.app.state or via Depends(get_*) helpers. Delete the module-level globals.
  5. Deploy. Watch the shutdown log. If the process takes more than 2 seconds to exit, something is wrong with your teardown order.
  6. Add a test that starts the app, makes a request, and shuts down cleanly. Confirm the whole lifecycle completes in under 5 seconds.

The headline: lifespan is the only right place for agent startup and shutdown. Module-level initialization is a footgun. The fix is 20 lines and kills 4 classes of bugs at once.

Frequently asked questions

What is FastAPI's lifespan and when do you use it?

lifespan is an async context manager passed to FastAPI(lifespan=...) that runs once at app startup and once at shutdown. Everything before yield is startup, everything after is shutdown. Use it for any resource that lives for the process lifetime: LLM clients, database engines, Redis pools, vector store clients, observability exporters. It replaces the deprecated @app.on_event decorators.

Why is module-level initialization broken for async services?

Because module imports run before the event loop exists. An async client instantiated at module level is either created on a temporary loop that gets discarded, or fails with "no running event loop." FastAPI creates the real loop per worker at startup, and resources must be initialized on that loop. lifespan runs after the loop exists, which is the only correct timing.

How do I access lifespan resources from a route?

Through request.app.state.<resource> or via a Depends helper that reads from app.state. The Depends pattern is cleaner because it makes the dependency explicit in the function signature and lets tests override it. Never reach into app.state directly from business logic; wrap it in a dependency or a service.

Why does teardown order matter?

Because resources depend on each other, and closing them in the wrong order causes hangs or errors. Rule: teardown in reverse order of acquisition. Caches first, application clients next, databases last, telemetry at the very end. A wrong order makes the process wait for SIGKILL on shutdown, which manifests as "slow deploys" in production.

Can I have multiple lifespan blocks in one app?

No, FastAPI takes exactly one lifespan context manager. If you need modular initialization, create helper async context managers and compose them inside the main lifespan. Use contextlib.AsyncExitStack for dynamic composition. One lifespan per app; compose inside it.

Key takeaways

  1. Module-level initialization is a footgun for async services. Event loop mismatch, shutdown hangs, worker sharing bugs, and test pollution all trace back to it.
  2. FastAPI lifespan is the only right place for process-scoped resources: LLM clients, DB engines, Redis, vector stores, telemetry.
  3. Initialize before yield, teardown after yield in reverse order of acquisition. Cache → app clients → database → telemetry.
  4. Routes access resources via request.app.state or a Depends helper. Never via module globals.
  5. lifespan eliminates event-loop-mismatch errors, shutdown hangs, and test pollution at once. 20 lines to migrate.
  6. To see lifespan wired into a full production agent stack with auth, pools, and observability, walk through the Build your own coding agent course, or start with the AI Agents Fundamentals primer.

For the FastAPI lifespan documentation covering AsyncExitStack, sub-app composition, and advanced patterns, see the FastAPI lifespan guide. The doc explicitly recommends lifespan over the older event decorators.

Share this post

Continue Reading

Weekly Bytes of AI

Technical deep-dives for engineers building production AI systems.

Architecture patterns, system design, cost optimization, and real-world case studies. No fluff, just engineering insights.

Unsubscribe anytime. We respect your inbox.

Ready to go deeper?

Go beyond articles. Build production AI systems with hands-on workshops and our intensive AI Bootcamp.