Production RAG: handling edge cases and failures

In our last post, we learned how to measure our RAG agent's quality. We built a "golden set" and used RAGAs to score our bot.

But this post is for you if you've ever moved a "working" demo to production, only to have it crash 10 minutes later.

In the real world, things break. APIs time out. Third-party services go down. LLMs get rate-limited. If your agent is just a "happy path" script, it's not a production system. It's a liability.

Today, we'll build a Resilient Agent that can handle real-world chaos using fallbacks, retries, and graceful degradation.

Why do agents become brittle?

Our "happy path" agent works great... if the world is perfect. But what happens when our "web search" tool times out?

sequenceDiagram
    participant User
    participant Agent
    participant Web_Search_Tool
    
    User->>Agent: "What's the news on Project-Z?"
    activate Agent
    Agent->>Web_Search_Tool: search("Project-Z")
    activate Web_Search_Tool
    
    note right of Web_Search_Tool: ... (API times out after 30s) ...
    
    Web_Search_Tool-->>Agent: [X ERROR 504: Gateway Timeout]
    deactivate Web_Search_Tool
    
    Agent->>Agent: [CRASH]
    Agent-->>User: {"error": "Internal Server Error"}
    deactivate Agent

Why this is bad:

The User gets a broken app. This is the worst possible experience.
The Agent is brittle. A single, common network error brought down our entire system.

A production agent must be anti-fragile. It needs a "Plan B."

What is a graceful fallback graph?

We can't prevent all errors, but we can handle them. We will stop thinking in "chains" and start thinking in "graphs" with conditional logic.

We will build an agent with this logic:

Try Tool A (e.g., our primary, high-quality paid search API).
Did it work?
- Yes: Great, go to "Generate Answer."
- No (Timeout/Error): Don't crash. Go to "Plan B."
Try Tool B (e.g., a free, less reliable web search tool).
Did that work?
- Yes: Great, go to "Generate Answer" (with the Plan B data).
- No: Don't crash. Go to "Plan C."
Plan C: Generate a graceful failure message.

This is called Graceful Degradation.

graph TD
    A[Start] --> B(Try Tool A: Paid Search API)
    B --> C{Success?}
    C -- "Yes" --> D[Generate Answer]
    C -- "No (e.g., Timeout)" --> E(Try Tool B: Free Web Search)
    E --> F{Success?}
    F -- "Yes" --> D
    F -- "No (e.g., Timeout)" --> G[Generate Graceful Error: Sorry, I can't search right now.]
    D --> H[End]
    G --> H
    
    style B fill:#e3f2fd,stroke:#0d47a1
    style E fill:#fff8e1,stroke:#f57f17
    style G fill:#ffebee,stroke:#b71c1c
    style D fill:#e8f5e9,stroke:#388e3c

How do you build fallbacks with LangGraph?

We can build this exact logic using LangGraph. We'll define our "state" and our "nodes," but this time, our nodes will include try/except blocks.

Brick 1: the "memory" (`graphstate`)

Our "memory" needs to hold the question and a context that might be filled by either Tool A or Tool B.

# filename: example.py
# description: Code example from the post.
from typing import TypedDict, List

class GraphState(TypedDict):
    question: str
    context: List[str]
    error_message: str # To store what went wrong

Brick 2: the "nodes" (with error handling)

Now, we build our nodes. This time, they don't just "run"; they "try to run."

# Our (fictional) tools
def paid_search_tool(query: str) -> List[str]:
    # This tool is great, but it might fail
    if "fail" in query: # A mock failure
        raise TimeoutError("API timed out after 30 seconds")
    return ["Fact from Paid API: ..."]

def free_search_tool(query: str) -> List[str]:
    # This is our cheap, reliable fallback
    return ["Fact from Free Search: ..."]

# --- Node 1: Try Tool A ---
def try_tool_a(state):
    print("---NODE: Trying Tool A (Paid Search)---")
    try:
        context = paid_search_tool(state["question"])
        return {"context": context, "error_message": None}
    except Exception as e:
        print(f"Tool A failed: {e}")
        return {"context": [], "error_message": str(e)}

# --- Node 2: Try Tool B (The Fallback) ---
def try_tool_b(state):
    print("---NODE: Trying Tool B (Free Search)---")
    # Our simple fallback tool is very reliable
    context = free_search_tool(state["question"])
    return {"context": context, "error_message": None}

# --- Node 3: The Final "Safety Net" ---
def generate_error_message(state):
    print("---NODE: All tools failed. Gracefully failing.---")
    return {"context": [f"I'm sorry, my search tools are currently offline. The error was: {state['error_message']}"]}

Observation: Our nodes are now "smart." They catch errors and update the GraphState instead of crashing the program.

Brick 3: the "wires" (the conditional logic)

Now, we wire it all up in our LangGraph workflow.

from langgraph.graph import StateGraph, END

# --- The "Decision" Edges ---
def check_tool_a_success(state):
    # Did the first tool work?
    if state["error_message"] is None:
        return "generate" # Yes, go straight to the answer
    else:
        return "try_tool_b" # No, trigger Plan B

def check_tool_b_success(state):
    # (In a real app, we'd check again, but for this demo
    # we'll assume Tool B always works or we'll fail)
    if state["context"]:
        return "generate"
    else:
        return "fail_gracefully"

# --- Build the Graph ---
workflow = StateGraph(GraphState)
workflow.add_node("try_tool_a", try_tool_a)
workflow.add_node("try_tool_b", try_tool_b)
workflow.add_node("fail_gracefully", generate_error_message)
workflow.add_node("generate", ...) # Our final LLM generator node

# --- Set the Logic Flow ---
workflow.set_entry_point("try_tool_a")

# The first critical decision
workflow.add_conditional_edges(
    "try_tool_a",
    check_tool_a_success,
    {
        "generate": "generate",
        "try_tool_b": "try_tool_b"
    }
)

# The second critical decision
workflow.add_conditional_edges(
    "try_tool_b",
    check_tool_b_success,
    {
        "generate": "generate",
        "fail_gracefully": "fail_gracefully"
    }
)

# The final paths
workflow.add_edge("generate", END)
workflow.add_edge("fail_gracefully", "generate") # We still go to 'generate' to show the user the error

app = workflow.compile()

Result: We've built a resilient agent!

If we send {"question": "What is Model-V?"}, it follows try_tool_a -> generate.
If we send {"question": "fail this query"}, it follows try_tool_a -> (Fails) -> try_tool_b -> generate.

Our bot no longer crashes. It degrades gracefully.

Challenge for you

Use Case: Our current logic retries any error.
The Problem: What if try_tool_a fails with a 401 Unauthorized (Bad API Key) error? Retrying with try_tool_b is a waste of time and money; the real problem is our key.
Your Task: How would you modify the check_tool_a_success logic to not retry on a 401? (Hint: The function can return more than two strings. What if it returned "fail_fast" and you added a new node for that?)

Implement fallback chains using conditional routing. Try your primary tool first. If it times out or returns an error, don't propagate the failure upward, route to a secondary tool instead. Use LangGraph to catch exceptions and update your state accordingly. Even a lower-quality fallback is better than a crashed agent. This is graceful degradation: you degrade the response quality, not the system.

Should you retry or fallback when a RAG tool fails?

Not all failures warrant retries. A 401 Unauthorized or invalid API key error is permanent, not transient. Retrying wastes money and time. Build decision nodes that distinguish error types before deciding whether to retry. Return states like 'retry', 'fallback', or 'fail' from your tool handlers. Route a timeout to 'retry once, then fallback'. Route auth errors straight to 'fallback' or 'fail'.

How do you implement graceful degradation in a RAG system?

Define your failure paths as first-class routes in your graph, not error-handling afterthoughts. Map out Plan A (primary tool), Plan B (fallback tool), and Plan C (graceful failure message) before you code. Use LangGraph nodes that wrap tool calls in try-catch blocks and return explicit states. Route based on those states, not exception types. This way, errors flow through your graph like any other path.

For the full reference, see the Anthropic agents guide.

Key takeaways

Production systems need error handling: Happy path code will fail in production, you must handle timeouts, rate limits, and service outages
Fallback strategies prevent crashes: When Tool A fails, gracefully try Tool B instead of crashing
Graceful degradation maintains UX: Even when tools fail, provide a helpful error message instead of a generic 500 error
Conditional edges enable resilience: LangGraph's conditional edges let you route based on success/failure, creating self-healing agents
Error state is part of state: Store error messages in your GraphState so downstream nodes can make informed decisions

For more on building resilient systems, see our concurrency and resilience guide.

For more on building production AI systems, check out our AI Bootcamp for Software Engineers.

Take the next step

RAG Fundamentals Workshop, Build resilient RAG pipelines with error handling hands-on

In our last post, we learned how to measure our RAG agent's quality. We built a "golden set" and used RAGAs to score our bot.

But this post is for you if you've ever moved a "working" demo to production, only to have it crash 10 minutes later.

In the real world, things break. APIs time out. Third-party services go down. LLMs get rate-limited. If your agent is just a "happy path" script, it's not a production system. It's a liability.

Today, we'll build a Resilient Agent that can handle real-world chaos using fallbacks, retries, and graceful degradation.

Why do agents become brittle?

Our "happy path" agent works great... if the world is perfect. But what happens when our "web search" tool times out?

sequenceDiagram
    participant User
    participant Agent
    participant Web_Search_Tool
    
    User->>Agent: "What's the news on Project-Z?"
    activate Agent
    Agent->>Web_Search_Tool: search("Project-Z")
    activate Web_Search_Tool
    
    note right of Web_Search_Tool: ... (API times out after 30s) ...
    
    Web_Search_Tool-->>Agent: [X ERROR 504: Gateway Timeout]
    deactivate Web_Search_Tool
    
    Agent->>Agent: [CRASH]
    Agent-->>User: {"error": "Internal Server Error"}
    deactivate Agent

Why this is bad:

The User gets a broken app. This is the worst possible experience.
The Agent is brittle. A single, common network error brought down our entire system.

A production agent must be anti-fragile. It needs a "Plan B."

What is a graceful fallback graph?

We can't prevent all errors, but we can handle them. We will stop thinking in "chains" and start thinking in "graphs" with conditional logic.

We will build an agent with this logic:

Try Tool A (e.g., our primary, high-quality paid search API).
Did it work?
- Yes: Great, go to "Generate Answer."
- No (Timeout/Error): Don't crash. Go to "Plan B."
Try Tool B (e.g., a free, less reliable web search tool).
Did that work?
- Yes: Great, go to "Generate Answer" (with the Plan B data).
- No: Don't crash. Go to "Plan C."
Plan C: Generate a graceful failure message.

This is called Graceful Degradation.

graph TD
    A[Start] --> B(Try Tool A: Paid Search API)
    B --> C{Success?}
    C -- "Yes" --> D[Generate Answer]
    C -- "No (e.g., Timeout)" --> E(Try Tool B: Free Web Search)
    E --> F{Success?}
    F -- "Yes" --> D
    F -- "No (e.g., Timeout)" --> G[Generate Graceful Error: Sorry, I can't search right now.]
    D --> H[End]
    G --> H
    
    style B fill:#e3f2fd,stroke:#0d47a1
    style E fill:#fff8e1,stroke:#f57f17
    style G fill:#ffebee,stroke:#b71c1c
    style D fill:#e8f5e9,stroke:#388e3c

How do you build fallbacks with LangGraph?

We can build this exact logic using LangGraph. We'll define our "state" and our "nodes," but this time, our nodes will include try/except blocks.

Brick 1: the "memory" (`graphstate`)

Our "memory" needs to hold the question and a context that might be filled by either Tool A or Tool B.

# filename: example.py
# description: Code example from the post.
from typing import TypedDict, List

class GraphState(TypedDict):
    question: str
    context: List[str]
    error_message: str # To store what went wrong

Brick 2: the "nodes" (with error handling)

Now, we build our nodes. This time, they don't just "run"; they "try to run."

# Our (fictional) tools
def paid_search_tool(query: str) -> List[str]:
    # This tool is great, but it might fail
    if "fail" in query: # A mock failure
        raise TimeoutError("API timed out after 30 seconds")
    return ["Fact from Paid API: ..."]

def free_search_tool(query: str) -> List[str]:
    # This is our cheap, reliable fallback
    return ["Fact from Free Search: ..."]

# --- Node 1: Try Tool A ---
def try_tool_a(state):
    print("---NODE: Trying Tool A (Paid Search)---")
    try:
        context = paid_search_tool(state["question"])
        return {"context": context, "error_message": None}
    except Exception as e:
        print(f"Tool A failed: {e}")
        return {"context": [], "error_message": str(e)}

# --- Node 2: Try Tool B (The Fallback) ---
def try_tool_b(state):
    print("---NODE: Trying Tool B (Free Search)---")
    # Our simple fallback tool is very reliable
    context = free_search_tool(state["question"])
    return {"context": context, "error_message": None}

# --- Node 3: The Final "Safety Net" ---
def generate_error_message(state):
    print("---NODE: All tools failed. Gracefully failing.---")
    return {"context": [f"I'm sorry, my search tools are currently offline. The error was: {state['error_message']}"]}

Observation: Our nodes are now "smart." They catch errors and update the GraphState instead of crashing the program.

Brick 3: the "wires" (the conditional logic)

Now, we wire it all up in our LangGraph workflow.

from langgraph.graph import StateGraph, END

# --- The "Decision" Edges ---
def check_tool_a_success(state):
    # Did the first tool work?
    if state["error_message"] is None:
        return "generate" # Yes, go straight to the answer
    else:
        return "try_tool_b" # No, trigger Plan B

def check_tool_b_success(state):
    # (In a real app, we'd check again, but for this demo
    # we'll assume Tool B always works or we'll fail)
    if state["context"]:
        return "generate"
    else:
        return "fail_gracefully"

# --- Build the Graph ---
workflow = StateGraph(GraphState)
workflow.add_node("try_tool_a", try_tool_a)
workflow.add_node("try_tool_b", try_tool_b)
workflow.add_node("fail_gracefully", generate_error_message)
workflow.add_node("generate", ...) # Our final LLM generator node

# --- Set the Logic Flow ---
workflow.set_entry_point("try_tool_a")

# The first critical decision
workflow.add_conditional_edges(
    "try_tool_a",
    check_tool_a_success,
    {
        "generate": "generate",
        "try_tool_b": "try_tool_b"
    }
)

# The second critical decision
workflow.add_conditional_edges(
    "try_tool_b",
    check_tool_b_success,
    {
        "generate": "generate",
        "fail_gracefully": "fail_gracefully"
    }
)

# The final paths
workflow.add_edge("generate", END)
workflow.add_edge("fail_gracefully", "generate") # We still go to 'generate' to show the user the error

app = workflow.compile()

Result: We've built a resilient agent!

If we send {"question": "What is Model-V?"}, it follows try_tool_a -> generate.
If we send {"question": "fail this query"}, it follows try_tool_a -> (Fails) -> try_tool_b -> generate.

Our bot no longer crashes. It degrades gracefully.

Challenge for you

Use Case: Our current logic retries any error.
The Problem: What if try_tool_a fails with a 401 Unauthorized (Bad API Key) error? Retrying with try_tool_b is a waste of time and money; the real problem is our key.
Your Task: How would you modify the check_tool_a_success logic to not retry on a 401? (Hint: The function can return more than two strings. What if it returned "fail_fast" and you added a new node for that?)

Production systems need error handling: Happy path code will fail in production, you must handle timeouts, rate limits, and service outages
Fallback strategies prevent crashes: When Tool A fails, gracefully try Tool B instead of crashing
Graceful degradation maintains UX: Even when tools fail, provide a helpful error message instead of a generic 500 error
Conditional edges enable resilience: LangGraph's conditional edges let you route based on success/failure, creating self-healing agents
Error state is part of state: Store error messages in your GraphState so downstream nodes can make informed decisions

For more on building resilient systems, see our concurrency and resilience guide.

For more on building production AI systems, check out our AI Bootcamp for Software Engineers.

Take the next step

RAG Fundamentals Workshop, Build resilient RAG pipelines with error handling hands-on

Production RAG: handling edge cases and failures

Share this post

Share this post

Continue Reading

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Retry patterns for LLM API errors in production

Weekly Bytes of AI

Ready to go deeper?

Production RAG: handling edge cases and failures

Share this post

Share this post

Continue Reading

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Retry patterns for LLM API errors in production

Weekly Bytes of AI

Ready to go deeper?