Pydantic output structuring for RAG agent plans

Your RAG planner returns a different JSON shape every single call

You asked the LLM to "return a plan as JSON." On call 1 it returned {"steps": [...]}. On call 2 it returned {"plan": [...]}. On call 3 it wrapped the JSON in a markdown code fence. On call 4 it added a chatty preamble. Your parser broke 4 times in one afternoon.

The fix is not a stricter prompt. The fix is a Pydantic model that the LLM fills in, with structured output enforcement at the API level. The shape is guaranteed, the fields are validated, and the parser is free.

This post is the Pydantic-for-RAG-planner pattern: the schema that works, the structured-output call, the validation retry loop, and the 3 reasons plain JSON prompting keeps breaking in production.

Why does plain JSON prompting fail so often?

Because LLMs are trained to produce plausible text, not guaranteed schemas. 3 specific failure modes that every team hits:

Shape drift. The same prompt produces {"steps": [...]} on Monday and {"plan": [...]} on Tuesday. Your code expects one key and breaks on the other.
Markdown wrapping. The model decides to be helpful and wraps the JSON in triple backticks. Your json.loads fails on the first character.
Chatty preambles. "Sure! Here is the JSON plan you requested: ..." Your parser sees text before the JSON and throws.

A Pydantic model plus structured output enforcement fixes all 3. The model is a contract, not a suggestion.

graph LR
    Query[User query] --> Planner[Planner LLM]
    Schema[Pydantic RAGPlan] --> Planner
    Planner --> Raw[Structured output]
    Raw --> Validate[Pydantic validator]
    Validate -->|valid| Execute[Execute plan]
    Validate -->|invalid| Retry[Retry with error]
    Retry --> Planner

    style Schema fill:#dbeafe,stroke:#1e40af
    style Execute fill:#dcfce7,stroke:#15803d

What does the Pydantic schema look like?

Tight, explicit, and with field descriptions the LLM can read.

# filename: app/rag/plan_schema.py
# description: Pydantic schema for a multi-step RAG plan.
from pydantic import BaseModel, Field
from typing import Literal


class RetrievalStep(BaseModel):
    step_id: int = Field(description="Sequential step number starting at 1")
    action: Literal["search", "rerank", "summarize", "answer"] = Field(
        description="The action to run for this step"
    )
    query: str = Field(description="The query or instruction for this step")
    depends_on: list[int] = Field(
        default_factory=list,
        description="IDs of earlier steps this step depends on",
    )


class RAGPlan(BaseModel):
    intent: str = Field(description="One-sentence summary of the user's intent")
    steps: list[RetrievalStep] = Field(min_length=1, max_length=6)
    confidence: float = Field(ge=0.0, le=1.0)

3 decisions worth pointing out. Literal on the action field restricts the LLM to 4 known actions; it cannot invent a new one. min_length=1, max_length=6 bounds the plan size, which prevents 50-step runaway plans. Field(description=...) on every field gets injected into the schema the LLM sees, so the model understands what each field means without a separate prompt paragraph.

How do you call the LLM with structured output?

The modern Anthropic and OpenAI SDKs accept a Pydantic model directly. No manual prompt engineering for the schema.

# filename: app/rag/planner.py
# description: Call the planner LLM with enforced Pydantic output.
from anthropic import Anthropic
from app.rag.plan_schema import RAGPlan

client = Anthropic()

PLANNER_PROMPT = """You are a RAG planner. Given the user's query, return a multi-step plan to answer it.

User query: {query}

Think about what the query needs, then output a plan that matches the RAGPlan schema.
"""


def build_plan(query: str) -> RAGPlan:
    reply = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=800,
        tools=[{
            "name": "return_plan",
            "description": "Return a structured RAG plan",
            "input_schema": RAGPlan.model_json_schema(),
        }],
        tool_choice={"type": "tool", "name": "return_plan"},
        messages=[{"role": "user", "content": PLANNER_PROMPT.format(query=query)}],
    )
    return RAGPlan.model_validate(reply.content[0].input)

The trick is using tool use with tool_choice forcing the model to call return_plan. The tool's input_schema is generated from the Pydantic model. The result comes back as a dict that Pydantic validates. No JSON parsing, no markdown stripping, no preamble handling.

For the broader agentic RAG pattern that uses this planner as one node in a graph, see the Agentic RAG with LangGraph post.

What do you do when validation still fails?

Sometimes the LLM produces output that passes the tool schema but fails business-logic validation (e.g., a step depends on a step that does not exist). Handle it with a retry loop that feeds the error back.

# filename: app/rag/planner_retry.py
# description: Retry planner with validation error feedback.
from pydantic import ValidationError


def build_plan_with_retry(query: str, max_attempts: int = 2) -> RAGPlan:
    last_error = None
    for attempt in range(max_attempts):
        try:
            plan = build_plan(query)
            _check_dependencies(plan)
            return plan
        except (ValidationError, ValueError) as e:
            last_error = str(e)
            query = f"{query}\n\nPrevious attempt failed: {last_error}. Fix it."
    raise RuntimeError(f"Planner failed after {max_attempts} attempts: {last_error}")


def _check_dependencies(plan: RAGPlan) -> None:
    seen = set()
    for step in plan.steps:
        for dep in step.depends_on:
            if dep not in seen:
                raise ValueError(f"Step {step.step_id} depends on missing step {dep}")
        seen.add(step.step_id)

2 attempts is usually enough. Add the error text to the prompt so the LLM sees what broke and fixes it on the next try. Most failures are single-call issues that resolve on retry.

Why use Pydantic instead of raw JSON schema?

Because Pydantic gives you 3 things a raw JSON schema cannot:

Python-side validation after the call. Even if the LLM returns valid shape, you can add custom validators (e.g., step_ids must be unique) without touching the schema.
Generated docs and type hints. Your planner function returns RAGPlan, not dict. Downstream code gets autocomplete and type checking.
Schema evolution. Add a field to the Pydantic model and the JSON schema updates automatically. No drift between the model you describe to the LLM and the model your code uses.

For the evaluation framework that checks whether the generated plans actually work, see the RAGAs evaluation for RAG pipelines post.

When is this overkill?

When your RAG pipeline is a single retrieve-then-answer call. You do not need a plan for "fetch 5 chunks, pass to LLM." Pydantic planning earns its complexity when:

The query needs multi-hop retrieval
Different actions depend on earlier results
You want traceable, auditable intermediate steps
You are routing between multiple retrievers or tools

For simple single-hop RAG, skip the planner entirely. Direct retrieval is faster and cheaper.

What to do Monday morning

Find the point in your RAG code where you parse LLM JSON into a plan or step list. That is where Pydantic belongs.
Write the Pydantic model for your plan shape. Keep it under 5 fields per nested model.
Switch the LLM call to tool-use mode with tool_choice forcing the planner tool. Drop the "return JSON" prompt language.
Add a retry loop that feeds validation errors back to the LLM on failure. Cap at 2 attempts.
Delete your old JSON parsing code. The Pydantic validator replaces it.

The headline: Pydantic plus structured output eliminates an entire class of RAG bugs. The schema is a contract, the LLM fills it in, the parser is free. Your planner stops crashing on Tuesday because you shipped on Monday.

Frequently asked questions

Why is Pydantic better than asking the LLM for JSON?

Because Pydantic plus structured output enforcement guarantees the shape at the API level, not at the prompt level. Plain JSON prompting produces shape drift, markdown wrapping, and chatty preambles, all of which break your parser. Pydantic validates every field, enforces type constraints, and gives you Python-side type hints for free. Your downstream code stops handling LLM quirks.

How do I use Pydantic with the Anthropic API?

Use tool use with tool_choice forcing your planner tool. Generate the tool's input_schema from YourModel.model_json_schema(). When the response comes back, call YourModel.model_validate(reply.content[0].input) to get a typed instance. The tool-use path is the official structured output mechanism and the LLM will only return shapes that match the schema.

What happens if the LLM still returns an invalid shape?

Catch the ValidationError, add the error message to the prompt, and retry. 2 attempts is usually enough; the LLM sees what broke and fixes it on the second try. If you hit 3+ failures on the same prompt, the schema is probably too strict or the prompt is unclear, tune the schema description fields.

Do I need Pydantic for every LLM call?

No. Use it when the output is structured data you will parse programmatically (plans, tool inputs, extracted fields, classifications). Do not use it for free-form text generation like final answers or summaries, those should stay as plain strings.

Can I nest Pydantic models?

Yes, and you should. A RAGPlan containing a list[RetrievalStep] is a standard pattern. Nested models get their own JSON schema, the LLM fills them in correctly, and validation cascades. Keep nesting to 2 levels max to avoid overwhelming the model.

Key takeaways

Plain JSON prompting produces shape drift, markdown wrapping, and chatty preambles. All 3 break parsers in production.
Pydantic schemas plus structured output enforcement guarantee the shape at the API level, not the prompt level.
Use Literal, min_length, max_length, and Field(description=...) to constrain what the LLM can return and explain each field without a separate prompt paragraph.
Call the LLM in tool-use mode with tool_choice forcing your planner tool. Validate the result with model_validate.
Retry on validation failure by adding the error text to the next prompt. Cap at 2 attempts to avoid runaway cost.
To see Pydantic planning wired into a full production agentic RAG stack, walk through the Agentic RAG Masterclass, or start with the RAG Fundamentals primer.

For the official Pydantic structured output documentation with provider-specific examples, see the Pydantic AI docs.

Your RAG planner returns a different JSON shape every single call

This post is the Pydantic-for-RAG-planner pattern: the schema that works, the structured-output call, the validation retry loop, and the 3 reasons plain JSON prompting keeps breaking in production.

Why does plain JSON prompting fail so often?

Because LLMs are trained to produce plausible text, not guaranteed schemas. 3 specific failure modes that every team hits:

Shape drift. The same prompt produces {"steps": [...]} on Monday and {"plan": [...]} on Tuesday. Your code expects one key and breaks on the other.
Markdown wrapping. The model decides to be helpful and wraps the JSON in triple backticks. Your json.loads fails on the first character.
Chatty preambles. "Sure! Here is the JSON plan you requested: ..." Your parser sees text before the JSON and throws.

A Pydantic model plus structured output enforcement fixes all 3. The model is a contract, not a suggestion.

graph LR
    Query[User query] --> Planner[Planner LLM]
    Schema[Pydantic RAGPlan] --> Planner
    Planner --> Raw[Structured output]
    Raw --> Validate[Pydantic validator]
    Validate -->|valid| Execute[Execute plan]
    Validate -->|invalid| Retry[Retry with error]
    Retry --> Planner

    style Schema fill:#dbeafe,stroke:#1e40af
    style Execute fill:#dcfce7,stroke:#15803d

What does the Pydantic schema look like?

Tight, explicit, and with field descriptions the LLM can read.

# filename: app/rag/plan_schema.py
# description: Pydantic schema for a multi-step RAG plan.
from pydantic import BaseModel, Field
from typing import Literal


class RetrievalStep(BaseModel):
    step_id: int = Field(description="Sequential step number starting at 1")
    action: Literal["search", "rerank", "summarize", "answer"] = Field(
        description="The action to run for this step"
    )
    query: str = Field(description="The query or instruction for this step")
    depends_on: list[int] = Field(
        default_factory=list,
        description="IDs of earlier steps this step depends on",
    )


class RAGPlan(BaseModel):
    intent: str = Field(description="One-sentence summary of the user's intent")
    steps: list[RetrievalStep] = Field(min_length=1, max_length=6)
    confidence: float = Field(ge=0.0, le=1.0)

How do you call the LLM with structured output?

The modern Anthropic and OpenAI SDKs accept a Pydantic model directly. No manual prompt engineering for the schema.

# filename: app/rag/planner.py
# description: Call the planner LLM with enforced Pydantic output.
from anthropic import Anthropic
from app.rag.plan_schema import RAGPlan

client = Anthropic()

PLANNER_PROMPT = """You are a RAG planner. Given the user's query, return a multi-step plan to answer it.

User query: {query}

Think about what the query needs, then output a plan that matches the RAGPlan schema.
"""


def build_plan(query: str) -> RAGPlan:
    reply = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=800,
        tools=[{
            "name": "return_plan",
            "description": "Return a structured RAG plan",
            "input_schema": RAGPlan.model_json_schema(),
        }],
        tool_choice={"type": "tool", "name": "return_plan"},
        messages=[{"role": "user", "content": PLANNER_PROMPT.format(query=query)}],
    )
    return RAGPlan.model_validate(reply.content[0].input)

For the broader agentic RAG pattern that uses this planner as one node in a graph, see the Agentic RAG with LangGraph post.

What do you do when validation still fails?

# filename: app/rag/planner_retry.py
# description: Retry planner with validation error feedback.
from pydantic import ValidationError


def build_plan_with_retry(query: str, max_attempts: int = 2) -> RAGPlan:
    last_error = None
    for attempt in range(max_attempts):
        try:
            plan = build_plan(query)
            _check_dependencies(plan)
            return plan
        except (ValidationError, ValueError) as e:
            last_error = str(e)
            query = f"{query}\n\nPrevious attempt failed: {last_error}. Fix it."
    raise RuntimeError(f"Planner failed after {max_attempts} attempts: {last_error}")


def _check_dependencies(plan: RAGPlan) -> None:
    seen = set()
    for step in plan.steps:
        for dep in step.depends_on:
            if dep not in seen:
                raise ValueError(f"Step {step.step_id} depends on missing step {dep}")
        seen.add(step.step_id)

2 attempts is usually enough. Add the error text to the prompt so the LLM sees what broke and fixes it on the next try. Most failures are single-call issues that resolve on retry.

Why use Pydantic instead of raw JSON schema?

Because Pydantic gives you 3 things a raw JSON schema cannot:

Python-side validation after the call. Even if the LLM returns valid shape, you can add custom validators (e.g., step_ids must be unique) without touching the schema.
Generated docs and type hints. Your planner function returns RAGPlan, not dict. Downstream code gets autocomplete and type checking.
Schema evolution. Add a field to the Pydantic model and the JSON schema updates automatically. No drift between the model you describe to the LLM and the model your code uses.

For the evaluation framework that checks whether the generated plans actually work, see the RAGAs evaluation for RAG pipelines post.

When is this overkill?

When your RAG pipeline is a single retrieve-then-answer call. You do not need a plan for "fetch 5 chunks, pass to LLM." Pydantic planning earns its complexity when:

The query needs multi-hop retrieval
Different actions depend on earlier results
You want traceable, auditable intermediate steps
You are routing between multiple retrievers or tools

For simple single-hop RAG, skip the planner entirely. Direct retrieval is faster and cheaper.

What to do Monday morning

Find the point in your RAG code where you parse LLM JSON into a plan or step list. That is where Pydantic belongs.
Write the Pydantic model for your plan shape. Keep it under 5 fields per nested model.
Switch the LLM call to tool-use mode with tool_choice forcing the planner tool. Drop the "return JSON" prompt language.
Add a retry loop that feeds validation errors back to the LLM on failure. Cap at 2 attempts.
Delete your old JSON parsing code. The Pydantic validator replaces it.

Plain JSON prompting produces shape drift, markdown wrapping, and chatty preambles. All 3 break parsers in production.
Pydantic schemas plus structured output enforcement guarantee the shape at the API level, not the prompt level.
Use Literal, min_length, max_length, and Field(description=...) to constrain what the LLM can return and explain each field without a separate prompt paragraph.
Call the LLM in tool-use mode with tool_choice forcing your planner tool. Validate the result with model_validate.
Retry on validation failure by adding the error text to the next prompt. Cap at 2 attempts to avoid runaway cost.
To see Pydantic planning wired into a full production agentic RAG stack, walk through the Agentic RAG Masterclass, or start with the RAG Fundamentals primer.

For the official Pydantic structured output documentation with provider-specific examples, see the Pydantic AI docs.

Pydantic output structuring for RAG agent plans

Share this post

Share this post

Continue Reading

Which language should you build Redis in? Lessons from rebuilding it 6 times

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Weekly Bytes of AI

Ready to go deeper?

Pydantic output structuring for RAG agent plans

Share this post

Share this post

Continue Reading

Which language should you build Redis in? Lessons from rebuilding it 6 times

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Weekly Bytes of AI

Ready to go deeper?