API versioning for agentic services: practical guide

You broke the API and nobody will let you ship again

You added a required field to a response. Nothing crazy, just a metadata object that was nice-to-have on the way in and nice-to-have on the way out. Your CLI broke. The browser extension broke. A customer's Zapier integration broke. Support tickets land within the hour and the rollback happens before lunch. Your next PR gets blocked for a week of review because "we can't trust API changes anymore."

This is why API versioning exists. Without it, every API change is a potential breaking change, and the team gradually stops shipping because the risk is too high. With it, you can ship new functionality behind a new version while old clients continue working on the old one. It is not fancy, it is not exciting, but it is the foundation of a service that stays shippable for years.

This post is the versioning pattern I use for agentic APIs: URL prefix versioning, the 3 rules for backward compatibility inside a version, the deprecation playbook, and when it is actually time to cut v2 instead of squeezing another change into v1.

Why do agentic APIs need versioning?

Because clients outside your control will parse response shapes in ways you did not anticipate, and an "additive" change to you is a "broken field assumption" to them. 3 categories of clients that will break on any change:

Statically typed clients. Anything written in Go, TypeScript, Rust with strict parsing, or any code with a Pydantic model on the client side. If you add a new field, these clients might reject the response entirely depending on their strictness settings.
Third-party integrations. Zapier, n8n, custom scripts. These are written by people who cannot update quickly and who will blame you when anything changes.
Cached SDKs. Your own SDK published to npm or PyPI might be a version behind. New required fields in requests will break SDK users until they update.

Versioning lets you add a /v2/chat endpoint with the new shape while /v1/chat keeps the old shape. Clients migrate when they are ready. Nobody breaks.

graph LR
    Client1[Old client] -->|v1/chat| V1[v1 handler]
    Client2[New client] -->|v2/chat| V2[v2 handler]
    V1 --> Core[Core agent logic]
    V2 --> Core
    Core --> Service[Service layer]

    style V1 fill:#fef3c7,stroke:#b45309
    style V2 fill:#dbeafe,stroke:#1e40af
    style Core fill:#dcfce7,stroke:#15803d

The key insight: both versions share the core logic. Only the thin serialization layer at the top differs. You are not maintaining 2 copies of the agent, just 2 copies of the request and response shapes.

Why is URL prefix versioning the right choice?

Because it is explicit, cacheable, and trivial to debug. GET /v1/chat vs GET /v2/chat is legible in logs, in documentation, in browser history, and in client code. The alternative patterns (header-based versioning, media type versioning) are more "correct" according to REST purists and significantly worse in every practical dimension.

3 reasons URL prefix wins:

Greppable. grep /v1/ finds every client calling the old version across your monorepo in seconds. Header-based versioning requires parsing request objects to find the same.
Cacheable. CDNs and reverse proxies cache by URL. Separate URLs for separate versions mean separate cache entries by default, no custom cache key configuration.
Self-documenting. A URL with /v1/ in it tells anyone reading the request what contract is in play. Header versions hide this detail until you inspect the request.

The trade-off is aesthetic: URL prefixes "pollute" the URL with a version number. That is a non-problem. Every major API (Stripe, GitHub, Twilio) uses URL prefix versioning for these exact reasons.

How do you structure url-prefixed routers in FastAPI?

One router per version, mounted under its own prefix. Shared business logic lives in a service layer that both routers call. The versioning lives only at the router level.

# filename: app/main.py
# description: Mount v1 and v2 routers with URL prefixes.
# Shared service layer keeps business logic DRY.
from fastapi import FastAPI
from app.api.v1 import router as v1_router
from app.api.v2 import router as v2_router

app = FastAPI()
app.include_router(v1_router, prefix='/v1')
app.include_router(v2_router, prefix='/v2')

Each version module has its own schemas, its own route handlers, and its own dependency on the shared service layer.

# filename: app/api/v1/chat.py
# description: v1 chat route with the original request/response shape.
from fastapi import APIRouter
from app.services.chat import run_chat
from .schemas import ChatRequestV1, ChatResponseV1

router = APIRouter()


@router.post('/chat', response_model=ChatResponseV1)
async def chat_v1(body: ChatRequestV1) -> ChatResponseV1:
    result = await run_chat(
        message=body.message,
        session_id=body.session_id,
    )
    return ChatResponseV1(
        answer=result.answer,
        session_id=result.session_id,
    )

# filename: app/api/v2/chat.py
# description: v2 chat route with the new shape including tool calls
# and citations. Calls the same service.
from fastapi import APIRouter
from app.services.chat import run_chat
from .schemas import ChatRequestV2, ChatResponseV2

router = APIRouter()


@router.post('/chat', response_model=ChatResponseV2)
async def chat_v2(body: ChatRequestV2) -> ChatResponseV2:
    result = await run_chat(
        message=body.message,
        session_id=body.session_id,
        include_reasoning=body.include_reasoning or False,
    )
    return ChatResponseV2(
        answer=result.answer,
        session_id=result.session_id,
        tool_calls=result.tool_calls,
        citations=result.citations,
        reasoning=result.reasoning if body.include_reasoning else None,
    )

Both routes call run_chat from the service layer. The difference is the request and response shapes, plus the v2-specific parameter. The business logic is never duplicated.

For the schema separation pattern that keeps this clean, see the API Schemas: Separating DB Models from API Responses post. For the full production stack context, see FastAPI and Uvicorn for Production Agentic AI Systems.

What are the 3 rules for non-breaking changes inside a version?

Within a single version, you can make additive changes without breaking clients, as long as you follow 3 rules:

Never remove a field. Old clients still read it. Once a field is in v1, it is in v1 forever.
Never change a field's type or semantics. Changing user_id: int to user_id: str breaks every strictly-typed client. Changing an enum value from active to live breaks any client that hardcoded the string.
Never change a field from optional to required. Old clients that omit the field will start failing validation.

What IS allowed within a version:

Adding new optional fields to responses. Old clients ignore them; new clients use them.
Adding new optional fields to requests. Old clients do not send them; new clients do.
Adding new endpoints. They are not in v1 by definition until a client calls them.
Deprecating endpoints via response headers. The endpoint still works; the header signals removal is coming.

Following these rules means v1 can live for a long time while v2 is in development. Most "breaking" changes are actually additive and do not need a new version.

When is it actually time to ship v2?

When you have a change that cannot be made additively within v1. 3 examples:

A field needs to be removed. The old field is wrong, deprecated, or exposes something it should not. Additive deprecation cannot fix this because old clients still read the field.
A field's semantics need to change. The meaning of status=active changes from "currently running" to "recently seen." Old clients interpret the new value wrong.
The response shape fundamentally changes. v1 returned a flat object; v2 needs nested results. No amount of optional fields makes this backward-compatible.

For everything else, ship within v1. Cutting a new version has real costs: maintenance, documentation, client migration effort, support confusion. The goal is to ship v2 rarely, maybe once a year at most.

What is the deprecation playbook?

A 4-step process over 3 to 6 months:

Announce. Write a migration guide, post in docs and changelog, send to every known client contact. Give a sunset date at least 90 days out.
Header. Add a Deprecation: true and Sunset: <date> header to every response from the deprecated version. Some clients auto-detect these headers and surface warnings.
Log. Log every v1 call with the client's user agent and account ID. Reach out directly to the top 10 users by call volume.
Sunset. On the announced date, return 410 Gone for any v1 call with a message pointing to v2. Leave this in place for another 30 days before deleting the v1 code.

The key mistake teams make is skipping step 3. If you only announce and wait, you will be surprised by the clients you did not know about when the sunset date hits. Direct outreach to your top users cuts the surprise to near zero.

What to do Monday morning

Add /v1/ to every existing route, even if you have never versioned before. 15 minutes of router reorg. Every new client now has an explicit version they target.
Set up a v2 router skeleton that shares the service layer with v1. You do not need any v2 routes yet; just have the structure ready for the first change that cannot be additive.
Document the 3 rules for non-breaking changes in your API style guide. Every PR that modifies an API route should pass all 3.
Add a Deprecation: true header to any route you intend to remove in the next major version. Start the deprecation clock before you need to.
Write a one-page migration guide template. When v2 ships, fill it out in 30 minutes. Clients will love you for it.

The headline: URL prefix versioning plus 3 rules for additive changes lets you ship API improvements without breaking anyone. Cut a new version only when no additive change can cover the requirement. Most changes never need v2.

Frequently asked questions

Why do APIs need versioning?

Because clients outside your control will break on changes you consider additive. A new required field, a renamed enum value, a changed type, any of these can break typed clients, third-party integrations, and outdated SDKs. Versioning lets you ship new shapes behind a new version while old clients keep working on the old one. It is the foundation of a service that stays shippable over years.

Should I version my API in the URL or in a header?

URL prefix. It is greppable, cacheable by default, self-documenting in logs and docs, and trivial to debug. Header-based versioning is more REST-pure and significantly worse in every practical dimension. Every major API company uses URL prefix versioning (Stripe, GitHub, Twilio) for these reasons.

What changes are safe to make within a single API version?

Additive changes only: new optional fields in requests and responses, new endpoints, deprecation headers. Never remove a field, never change a field's type or semantics, never make an optional field required. Following these rules means most "new features" ship inside v1 without needing a new version.

When should I cut a v2 instead of patching v1?

When you have a change that cannot be made additively: a field must be removed, a field's meaning must change, or the response shape fundamentally changes. Everything else is additive and belongs in v1. The goal is to cut v2 rarely, maybe once a year, because the maintenance and client migration cost is real.

How long should a deprecated API version stay live?

At least 90 days after announcement, longer if you have high-value enterprise clients on it. The deprecation playbook is: announce with a migration guide, add Deprecation and Sunset response headers, log calls and reach out to top users directly, then return 410 Gone after the announced date. Give clients real time to migrate.

Key takeaways

Unversioned APIs force breaking changes to either ship at huge blast radius or never ship at all. Versioning is the foundation of continued shipping velocity.
URL prefix versioning (/v1/, /v2/) is the right choice because it is greppable, cacheable, and self-documenting. Header-based is fashionable and worse.
Within a version, follow the 3 additive rules: never remove fields, never change types or semantics, never make optional fields required. Most changes fit inside these rules.
Cut v2 only when no additive change can cover the requirement. Maintenance cost of multiple versions is real; minimize it.
Deprecation takes 90+ days with announcement, headers, direct outreach, and a final 410 Gone. Skipping steps creates surprise breakage at sunset.
To see versioning wired into a full production agentic API with auth, streaming, and observability, walk through the Build Your Own Coding Agent course, or start with the AI Agents Fundamentals primer.

For the HTTP deprecation header standard and best practices, see the IETF draft on the Deprecation header. The sunset policies documented there map directly onto the playbook in this post.

You broke the API and nobody will let you ship again

Why do agentic APIs need versioning?

Statically typed clients. Anything written in Go, TypeScript, Rust with strict parsing, or any code with a Pydantic model on the client side. If you add a new field, these clients might reject the response entirely depending on their strictness settings.
Third-party integrations. Zapier, n8n, custom scripts. These are written by people who cannot update quickly and who will blame you when anything changes.
Cached SDKs. Your own SDK published to npm or PyPI might be a version behind. New required fields in requests will break SDK users until they update.

Versioning lets you add a /v2/chat endpoint with the new shape while /v1/chat keeps the old shape. Clients migrate when they are ready. Nobody breaks.

graph LR
    Client1[Old client] -->|v1/chat| V1[v1 handler]
    Client2[New client] -->|v2/chat| V2[v2 handler]
    V1 --> Core[Core agent logic]
    V2 --> Core
    Core --> Service[Service layer]

    style V1 fill:#fef3c7,stroke:#b45309
    style V2 fill:#dbeafe,stroke:#1e40af
    style Core fill:#dcfce7,stroke:#15803d

Why is URL prefix versioning the right choice?

3 reasons URL prefix wins:

Greppable. grep /v1/ finds every client calling the old version across your monorepo in seconds. Header-based versioning requires parsing request objects to find the same.
Cacheable. CDNs and reverse proxies cache by URL. Separate URLs for separate versions mean separate cache entries by default, no custom cache key configuration.
Self-documenting. A URL with /v1/ in it tells anyone reading the request what contract is in play. Header versions hide this detail until you inspect the request.

The trade-off is aesthetic: URL prefixes "pollute" the URL with a version number. That is a non-problem. Every major API (Stripe, GitHub, Twilio) uses URL prefix versioning for these exact reasons.

How do you structure url-prefixed routers in FastAPI?

One router per version, mounted under its own prefix. Shared business logic lives in a service layer that both routers call. The versioning lives only at the router level.

# filename: app/main.py
# description: Mount v1 and v2 routers with URL prefixes.
# Shared service layer keeps business logic DRY.
from fastapi import FastAPI
from app.api.v1 import router as v1_router
from app.api.v2 import router as v2_router

app = FastAPI()
app.include_router(v1_router, prefix='/v1')
app.include_router(v2_router, prefix='/v2')

Each version module has its own schemas, its own route handlers, and its own dependency on the shared service layer.

# filename: app/api/v1/chat.py
# description: v1 chat route with the original request/response shape.
from fastapi import APIRouter
from app.services.chat import run_chat
from .schemas import ChatRequestV1, ChatResponseV1

router = APIRouter()


@router.post('/chat', response_model=ChatResponseV1)
async def chat_v1(body: ChatRequestV1) -> ChatResponseV1:
    result = await run_chat(
        message=body.message,
        session_id=body.session_id,
    )
    return ChatResponseV1(
        answer=result.answer,
        session_id=result.session_id,
    )

# filename: app/api/v2/chat.py
# description: v2 chat route with the new shape including tool calls
# and citations. Calls the same service.
from fastapi import APIRouter
from app.services.chat import run_chat
from .schemas import ChatRequestV2, ChatResponseV2

router = APIRouter()


@router.post('/chat', response_model=ChatResponseV2)
async def chat_v2(body: ChatRequestV2) -> ChatResponseV2:
    result = await run_chat(
        message=body.message,
        session_id=body.session_id,
        include_reasoning=body.include_reasoning or False,
    )
    return ChatResponseV2(
        answer=result.answer,
        session_id=result.session_id,
        tool_calls=result.tool_calls,
        citations=result.citations,
        reasoning=result.reasoning if body.include_reasoning else None,
    )

Both routes call run_chat from the service layer. The difference is the request and response shapes, plus the v2-specific parameter. The business logic is never duplicated.

What are the 3 rules for non-breaking changes inside a version?

Within a single version, you can make additive changes without breaking clients, as long as you follow 3 rules:

Never remove a field. Old clients still read it. Once a field is in v1, it is in v1 forever.
Never change a field's type or semantics. Changing user_id: int to user_id: str breaks every strictly-typed client. Changing an enum value from active to live breaks any client that hardcoded the string.
Never change a field from optional to required. Old clients that omit the field will start failing validation.

What IS allowed within a version:

Adding new optional fields to responses. Old clients ignore them; new clients use them.
Adding new optional fields to requests. Old clients do not send them; new clients do.
Adding new endpoints. They are not in v1 by definition until a client calls them.
Deprecating endpoints via response headers. The endpoint still works; the header signals removal is coming.

Following these rules means v1 can live for a long time while v2 is in development. Most "breaking" changes are actually additive and do not need a new version.

When is it actually time to ship v2?

When you have a change that cannot be made additively within v1. 3 examples:

A field needs to be removed. The old field is wrong, deprecated, or exposes something it should not. Additive deprecation cannot fix this because old clients still read the field.
A field's semantics need to change. The meaning of status=active changes from "currently running" to "recently seen." Old clients interpret the new value wrong.
The response shape fundamentally changes. v1 returned a flat object; v2 needs nested results. No amount of optional fields makes this backward-compatible.

What is the deprecation playbook?

A 4-step process over 3 to 6 months:

Announce. Write a migration guide, post in docs and changelog, send to every known client contact. Give a sunset date at least 90 days out.
Header. Add a Deprecation: true and Sunset: <date> header to every response from the deprecated version. Some clients auto-detect these headers and surface warnings.
Log. Log every v1 call with the client's user agent and account ID. Reach out directly to the top 10 users by call volume.
Sunset. On the announced date, return 410 Gone for any v1 call with a message pointing to v2. Leave this in place for another 30 days before deleting the v1 code.

What to do Monday morning

Add /v1/ to every existing route, even if you have never versioned before. 15 minutes of router reorg. Every new client now has an explicit version they target.
Set up a v2 router skeleton that shares the service layer with v1. You do not need any v2 routes yet; just have the structure ready for the first change that cannot be additive.
Document the 3 rules for non-breaking changes in your API style guide. Every PR that modifies an API route should pass all 3.
Add a Deprecation: true header to any route you intend to remove in the next major version. Start the deprecation clock before you need to.
Write a one-page migration guide template. When v2 ships, fill it out in 30 minutes. Clients will love you for it.

Unversioned APIs force breaking changes to either ship at huge blast radius or never ship at all. Versioning is the foundation of continued shipping velocity.
URL prefix versioning (/v1/, /v2/) is the right choice because it is greppable, cacheable, and self-documenting. Header-based is fashionable and worse.
Within a version, follow the 3 additive rules: never remove fields, never change types or semantics, never make optional fields required. Most changes fit inside these rules.
Cut v2 only when no additive change can cover the requirement. Maintenance cost of multiple versions is real; minimize it.
Deprecation takes 90+ days with announcement, headers, direct outreach, and a final 410 Gone. Skipping steps creates surprise breakage at sunset.
To see versioning wired into a full production agentic API with auth, streaming, and observability, walk through the Build Your Own Coding Agent course, or start with the AI Agents Fundamentals primer.

For the HTTP deprecation header standard and best practices, see the IETF draft on the Deprecation header. The sunset policies documented there map directly onto the playbook in this post.

API versioning for agentic services: a practical guide

Share this post

Share this post

Continue Reading

Which language should you build Redis in? Lessons from rebuilding it 6 times

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Weekly Bytes of AI

Ready to go deeper?

API versioning for agentic services: a practical guide

Share this post

Share this post

Continue Reading

Which language should you build Redis in? Lessons from rebuilding it 6 times

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Weekly Bytes of AI

Ready to go deeper?