Service layer for AI agents: decoupling logic from routes
Your route handler is the agent loop and you cannot test either
Open your agent's /chat route. It is 180 lines. It parses the request, validates auth, loads a database session, runs the agent loop, handles retries, serializes the response, and writes observability events. You cannot test the agent loop without also spinning up FastAPI, a database, and a real LLM client. Every bug fix is a diff across all 180 lines.
The fix is a service layer. The route handler shrinks to 5 lines: parse input, call a service function, return the response. The service function holds the agent loop, the business logic, and the orchestration. Each is testable on its own.
This post is the service layer pattern for AI agents: what belongs in it, what stays in the route, how to inject infrastructure for tests, and the 30-minute refactor from a fat route to a thin one.
Why does a fat route handler break testing and growth?
Because FastAPI route handlers are the wrong place for business logic. Routes have 2 jobs: parse input, format output. The moment an agent loop lives in a route handler, 3 things go wrong.
- Tests need the whole stack. You cannot unit-test the agent loop without FastAPI's
TestClient, an auth dependency, and a real or mocked database session. Every test is an integration test. - Reuse becomes impossible. A scheduled job that needs the same agent loop (for batch processing) has to either duplicate the route code or call the route over HTTP. Both are broken.
- Mocking is painful. The agent loop imports the LLM client at module top level. To mock it, you patch the module, which is fragile and slow.
A service layer fixes all 3 by putting the agent loop in a plain Python function that takes its dependencies as arguments. The route calls the function. A test calls the function with fakes. A background job calls the function directly.
graph LR
Route[POST /chat route] --> Service[run_chat_turn service]
CronJob[Batch job] --> Service
Service --> Agent[Agent loop]
Service --> DB[(Database)]
Service --> LLM[LLM client]
Test[Unit test] --> Service
Test -.->|fakes| FDB[FakeDB]
Test -.->|fakes| FLLM[FakeLLM]
style Service fill:#dcfce7,stroke:#15803d
What belongs in the service layer?
The service layer owns business logic: what happens, in what order, with what fallbacks. It does NOT own:
- HTTP parsing (Routes)
- Prompt templates (Domain)
- Tool definitions (Domain)
- Database connection pool (Infrastructure)
- LLM HTTP client (Infrastructure)
- Pydantic request/response schemas (Domain)
It DOES own:
- The agent loop (turn sequence, tool dispatch, termination conditions)
- Session state management (load history, save turn, clean up)
- Retry and fallback orchestration
- Rate limiting at the request level
- Cost tracking and telemetry emission
How do you write a service function?
A service function takes its dependencies explicitly as arguments or through a lightweight dependency injection container. No implicit module-level globals, no calls to Settings() inside the function, no direct Anthropic() instantiation.
# filename: app/services/chat.py
# description: The chat service orchestrates the agent loop.
# All dependencies arrive via parameters, making the function testable.
from dataclasses import dataclass
from typing import Protocol
from app.domain.prompts import SYSTEM_PROMPT
from app.domain.tools import dispatch
class LLMClient(Protocol):
async def complete(self, messages: list[dict]) -> str: ...
class SessionStore(Protocol):
async def load(self, session_id: str) -> list[dict]: ...
async def save(self, session_id: str, messages: list[dict]) -> None: ...
@dataclass
class ChatResult:
answer: str
session_id: str
token_count: int
async def run_chat_turn(
user_id: str,
message: str,
session_id: str,
llm: LLMClient,
store: SessionStore,
) -> ChatResult:
history = await store.load(session_id)
history.append({'role': 'user', 'content': message})
answer = await llm.complete([
{'role': 'system', 'content': SYSTEM_PROMPT},
*history,
])
history.append({'role': 'assistant', 'content': answer})
await store.save(session_id, history)
return ChatResult(answer=answer, session_id=session_id, token_count=len(answer.split()))
The key moves: LLMClient and SessionStore are Protocol types, not concrete classes. The function takes them as parameters. Tests pass fakes. Routes pass real implementations. The agent loop inside run_chat_turn has zero knowledge of whether it is running behind FastAPI, a cron job, or a unit test.
For the modular architecture pattern that this service layer sits inside, see the Modular architectures for agentic AI post.
How does the route shrink after extraction?
Before (180 lines) → after (12 lines). The route becomes trivial:
# filename: app/routes/chat.py
# description: Thin route handler that delegates to the service.
from fastapi import APIRouter, Depends
from app.services.chat import run_chat_turn
from app.schemas import ChatRequest, ChatResponse
from app.infra.llm import get_llm
from app.infra.db import get_session_store
from app.auth import get_auth
router = APIRouter()
@router.post('/chat', response_model=ChatResponse)
async def chat(
body: ChatRequest,
auth=Depends(get_auth),
llm=Depends(get_llm),
store=Depends(get_session_store),
) -> ChatResponse:
result = await run_chat_turn(
user_id=auth.user_id,
message=body.message,
session_id=body.session_id,
llm=llm,
store=store,
)
return ChatResponse(answer=result.answer, session_id=result.session_id)
Every dependency arrives via FastAPI's Depends. The route parses input, calls the service, formats output. No business logic leaks in. If the route breaks, it is because input parsing or response formatting is wrong, nothing else.
How do you test the service with fakes?
Because the service takes its dependencies as arguments, tests pass hand-written fakes instead of mocking modules.
# filename: tests/services/test_chat.py
# description: Unit test for the chat service using fake LLM and store.
import pytest
from app.services.chat import run_chat_turn
class FakeLLM:
def __init__(self, response='hello world'):
self.calls = []
self.response = response
async def complete(self, messages):
self.calls.append(messages)
return self.response
class FakeStore:
def __init__(self):
self.sessions = {}
async def load(self, session_id):
return self.sessions.get(session_id, [])
async def save(self, session_id, messages):
self.sessions[session_id] = messages
@pytest.mark.asyncio
async def test_chat_turn_saves_history():
llm = FakeLLM('hi from agent')
store = FakeStore()
result = await run_chat_turn(
user_id='u1', message='hello', session_id='s1', llm=llm, store=store,
)
assert result.answer == 'hi from agent'
assert len(store.sessions['s1']) == 2 # user + assistant
5 lines of test, no database, no HTTP, no real LLM. Runs in under 50 milliseconds. A suite of 100 service tests runs in under 5 seconds total.
For the full test strategy across all 4 layers, see the Modular architectures for agentic AI post.
When should you split a service into smaller services?
When a single service function crosses 100 lines or owns more than one concept. Split along use-case boundaries: chat_service.py for chat turns, session_service.py for session CRUD, evaluation_service.py for grading. Each service is a cohesive set of functions that share the same dependencies.
The rule I use: if a new engineer asks "where is the code that does X?", the service file should be findable by the concept name alone, not by reading the import graph.
For the broader production stack that this service layer sits inside, the Build your own coding agent course covers it alongside the tool registry and the observability layer. The free AI Agents Fundamentals primer is the right starting point if the agent loop is still new.
What to do Monday morning
- Open your biggest route handler. Count the lines. If it is over 40, extraction is worth it.
- Create
app/services/{feature}.py. Move the business logic there. Write it as a function that takes its dependencies as explicit arguments. - Replace the route handler body with a 5-line call to the service function. Inject dependencies via
Depends. - Write one unit test with fake implementations of the dependencies. Confirm the test runs in under 100 milliseconds without any network or database.
- Repeat for the next route. After 3 routes, the pattern becomes obvious and each subsequent extraction takes 10 minutes.
The headline: the service layer is where your agent loop wants to live. Routes become 12 lines. Tests become 50 milliseconds. Refactor takes a day. Shippable forever after.
Frequently asked questions
What is a service layer in an agentic AI codebase?
A service layer is a set of plain Python functions that hold the business logic of an agent, separate from HTTP routes and infrastructure. Routes parse input and format output. Services run the agent loop, manage sessions, call tools, and orchestrate retries. Infrastructure provides the LLM client and database. Splitting these three roles lets each be tested and changed in isolation.
Why not just put the agent loop in the route handler?
Because routes are not the right place for business logic. An agent loop in a route handler cannot be called from a batch job or a cron task without duplicating code, cannot be unit tested without FastAPI's TestClient, and conflates input parsing with orchestration. Moving the loop to a service function fixes all three and takes 30 minutes per feature.
How do I inject dependencies into a service function?
Pass them as function arguments. Declare a Protocol type for each dependency (LLM client, session store, cache) so the function is typed against the interface, not the concrete class. In routes, use FastAPI's Depends to wire real instances. In tests, pass hand-written fakes. No decorators, no DI containers, no magic.
How do I test a service function?
Write a fake for each dependency protocol. Call the service function directly with the fakes. Assert on the return value and the fake's recorded calls. A typical service test runs in under 100 milliseconds because there is no network, no database, and no real LLM involved. A full service test suite of 100 tests runs in under 5 seconds.
When should I split one service file into multiple?
When a single service file crosses 200 lines or owns more than one concept. Split along use-case boundaries: chat_service.py, session_service.py, evaluation_service.py. The rule of thumb: if a new engineer cannot find the code for a feature by guessing the file name from the feature name, the split is wrong.
Key takeaways
- A fat route handler is untestable, unreusable, and hard to grow. The fix is a service layer that owns business logic independently of HTTP.
- Service functions take their dependencies as arguments via Protocol types. Real implementations come from routes; fakes come from tests.
- Routes become 12 lines: parse input, call service, format output. All business logic lives in services.
- Tests become 50 milliseconds because they instantiate fakes instead of a database or LLM client. A 100-test suite runs in 5 seconds.
- Split services by use case: chat, session, evaluation. One file per concept, not one file per endpoint.
- To see the service layer wired into a full production agent stack with auth, tools, and observability, walk through the Build your own coding agent course, or start with the AI Agents Fundamentals primer.
For the Domain-Driven Design framing that this service layer borrows from, see Vaughn Vernon's implementing DDD article on service layers. The pattern predates AI agents by 15 years and translates directly.
Continue Reading
Ready to go deeper?
Go beyond articles. Build production AI systems with hands-on workshops and our intensive AI Bootcamp.