Production RAG: handling edge cases and failures
In our last post, we learned how to measure our RAG agent's quality. We built a "golden set" and used RAGAs to score our bot.
But this post is for you if you've ever moved a "working" demo to production, only to have it crash 10 minutes later.
In the real world, things break. APIs time out. Third-party services go down. LLMs get rate-limited. If your agent is just a "happy path" script, it's not a production system. It's a liability.
Today, we'll build a Resilient Agent that can handle real-world chaos using fallbacks, retries, and graceful degradation.
Why do agents become brittle?
Our "happy path" agent works great... if the world is perfect. But what happens when our "web search" tool times out?
sequenceDiagram
participant User
participant Agent
participant Web_Search_Tool
User->>Agent: "What's the news on Project-Z?"
activate Agent
Agent->>Web_Search_Tool: search("Project-Z")
activate Web_Search_Tool
note right of Web_Search_Tool: ... (API times out after 30s) ...
Web_Search_Tool-->>Agent: [X ERROR 504: Gateway Timeout]
deactivate Web_Search_Tool
Agent->>Agent: [CRASH]
Agent-->>User: {"error": "Internal Server Error"}
deactivate Agent
Why this is bad:
- The User gets a broken app. This is the worst possible experience.
- The Agent is brittle. A single, common network error brought down our entire system.
A production agent must be anti-fragile. It needs a "Plan B."
What is a graceful fallback graph?
We can't prevent all errors, but we can handle them. We will stop thinking in "chains" and start thinking in "graphs" with conditional logic.
We will build an agent with this logic:
- Try Tool A (e.g., our primary, high-quality paid search API).
- Did it work?
- Yes: Great, go to "Generate Answer."
- No (Timeout/Error): Don't crash. Go to "Plan B."
- Try Tool B (e.g., a free, less reliable web search tool).
- Did that work?
- Yes: Great, go to "Generate Answer" (with the Plan B data).
- No: Don't crash. Go to "Plan C."
- Plan C: Generate a graceful failure message.
This is called Graceful Degradation.
graph TD
A[Start] --> B(Try Tool A: Paid Search API)
B --> C{Success?}
C -- "Yes" --> D[Generate Answer]
C -- "No (e.g., Timeout)" --> E(Try Tool B: Free Web Search)
E --> F{Success?}
F -- "Yes" --> D
F -- "No (e.g., Timeout)" --> G[Generate Graceful Error: Sorry, I can't search right now.]
D --> H[End]
G --> H
style B fill:#e3f2fd,stroke:#0d47a1
style E fill:#fff8e1,stroke:#f57f17
style G fill:#ffebee,stroke:#b71c1c
style D fill:#e8f5e9,stroke:#388e3c
How do you build fallbacks with LangGraph?
We can build this exact logic using LangGraph. We'll define our "state" and our "nodes," but this time, our nodes will include try/except blocks.
Brick 1: the "memory" (graphstate)
Our "memory" needs to hold the question and a context that might be filled by either Tool A or Tool B.
# filename: example.py
# description: Code example from the post.
from typing import TypedDict, List
class GraphState(TypedDict):
question: str
context: List[str]
error_message: str # To store what went wrong
Brick 2: the "nodes" (with error handling)
Now, we build our nodes. This time, they don't just "run"; they "try to run."
# Our (fictional) tools
def paid_search_tool(query: str) -> List[str]:
# This tool is great, but it might fail
if "fail" in query: # A mock failure
raise TimeoutError("API timed out after 30 seconds")
return ["Fact from Paid API: ..."]
def free_search_tool(query: str) -> List[str]:
# This is our cheap, reliable fallback
return ["Fact from Free Search: ..."]
# --- Node 1: Try Tool A ---
def try_tool_a(state):
print("---NODE: Trying Tool A (Paid Search)---")
try:
context = paid_search_tool(state["question"])
return {"context": context, "error_message": None}
except Exception as e:
print(f"Tool A failed: {e}")
return {"context": [], "error_message": str(e)}
# --- Node 2: Try Tool B (The Fallback) ---
def try_tool_b(state):
print("---NODE: Trying Tool B (Free Search)---")
# Our simple fallback tool is very reliable
context = free_search_tool(state["question"])
return {"context": context, "error_message": None}
# --- Node 3: The Final "Safety Net" ---
def generate_error_message(state):
print("---NODE: All tools failed. Gracefully failing.---")
return {"context": [f"I'm sorry, my search tools are currently offline. The error was: {state['error_message']}"]}
Observation: Our nodes are now "smart." They catch errors and update the GraphState instead of crashing the program.
Brick 3: the "wires" (the conditional logic)
Now, we wire it all up in our LangGraph workflow.
from langgraph.graph import StateGraph, END
# --- The "Decision" Edges ---
def check_tool_a_success(state):
# Did the first tool work?
if state["error_message"] is None:
return "generate" # Yes, go straight to the answer
else:
return "try_tool_b" # No, trigger Plan B
def check_tool_b_success(state):
# (In a real app, we'd check again, but for this demo
# we'll assume Tool B always works or we'll fail)
if state["context"]:
return "generate"
else:
return "fail_gracefully"
# --- Build the Graph ---
workflow = StateGraph(GraphState)
workflow.add_node("try_tool_a", try_tool_a)
workflow.add_node("try_tool_b", try_tool_b)
workflow.add_node("fail_gracefully", generate_error_message)
workflow.add_node("generate", ...) # Our final LLM generator node
# --- Set the Logic Flow ---
workflow.set_entry_point("try_tool_a")
# The first critical decision
workflow.add_conditional_edges(
"try_tool_a",
check_tool_a_success,
{
"generate": "generate",
"try_tool_b": "try_tool_b"
}
)
# The second critical decision
workflow.add_conditional_edges(
"try_tool_b",
check_tool_b_success,
{
"generate": "generate",
"fail_gracefully": "fail_gracefully"
}
)
# The final paths
workflow.add_edge("generate", END)
workflow.add_edge("fail_gracefully", "generate") # We still go to 'generate' to show the user the error
app = workflow.compile()
Result: We've built a resilient agent!
- If we send
{"question": "What is Model-V?"}, it followstry_tool_a->generate. - If we send
{"question": "fail this query"}, it followstry_tool_a-> (Fails) ->try_tool_b->generate.
Our bot no longer crashes. It degrades gracefully.
Challenge for you
- Use Case: Our current logic retries any error.
- The Problem: What if
try_tool_afails with a 401 Unauthorized (Bad API Key) error? Retrying withtry_tool_bis a waste of time and money; the real problem is our key. - Your Task: How would you modify the
check_tool_a_successlogic to not retry on a 401? (Hint: The function can return more than two strings. What if it returned"fail_fast"and you added a new node for that?)
Frequently asked questions
How do you handle API failures in production RAG agents?
Implement fallback chains using conditional routing. Try your primary tool first. If it times out or returns an error, don't propagate the failure upward, route to a secondary tool instead. Use LangGraph to catch exceptions and update your state accordingly. Even a lower-quality fallback is better than a crashed agent. This is graceful degradation: you degrade the response quality, not the system.
Should you retry or fallback when a RAG tool fails?
Not all failures warrant retries. A 401 Unauthorized or invalid API key error is permanent, not transient. Retrying wastes money and time. Build decision nodes that distinguish error types before deciding whether to retry. Return states like 'retry', 'fallback', or 'fail' from your tool handlers. Route a timeout to 'retry once, then fallback'. Route auth errors straight to 'fallback' or 'fail'.
How do you implement graceful degradation in a RAG system?
Define your failure paths as first-class routes in your graph, not error-handling afterthoughts. Map out Plan A (primary tool), Plan B (fallback tool), and Plan C (graceful failure message) before you code. Use LangGraph nodes that wrap tool calls in try-catch blocks and return explicit states. Route based on those states, not exception types. This way, errors flow through your graph like any other path.
For the full reference, see the Anthropic agents guide.
Key takeaways
- Production systems need error handling: Happy path code will fail in production, you must handle timeouts, rate limits, and service outages
- Fallback strategies prevent crashes: When Tool A fails, gracefully try Tool B instead of crashing
- Graceful degradation maintains UX: Even when tools fail, provide a helpful error message instead of a generic 500 error
- Conditional edges enable resilience: LangGraph's conditional edges let you route based on success/failure, creating self-healing agents
- Error state is part of state: Store error messages in your GraphState so downstream nodes can make informed decisions
For more on building resilient systems, see our concurrency and resilience guide.
For more on building production AI systems, check out our AI Bootcamp for Software Engineers.
Take the next step
- RAG Fundamentals Workshop, Build resilient RAG pipelines with error handling hands-on
Continue Reading
Ready to go deeper?
Go beyond articles. Build production AI systems with hands-on workshops and our intensive AI Bootcamp.