47% OFFYearly Pro

$30/mo$16/mobilled yearlyGet Pro

47% OFFYearly Pro$30/mo$16/mobilled yearlyGet Pro

Topic

Production AI

Explore our latest articles and insights about Production AI.

Learn Production AI hands-on. Master every major agent design pattern with production code

Explore posts

62 posts in total

LLM Engineering

Query anonymization for RAG bias mitigation

How to strip names, roles, and demographics from queries before retrieval to reduce RAG bias. The redaction pipeline and the 3 leakage traps to avoid.

RAGGuardrails+3

Read post

9 min

AI Engineering in Practice

pip vs uv vs poetry for Python AI services

Which Python dependency manager should you use for production agent services in 2026? The install speed, lockfile story, and Docker build times compared.

PythonAI Agents+3

Read post

9 min

AI Engineering in Practice

Retry patterns for LLM API errors in production

How to build retry logic that handles rate limits, timeouts, and transient failures without burning money. The backoff rules and the 3 errors you must not retry.

AI AgentsError Handling+3

Read post

8 min

LLM Engineering

Choosing the LLM judge for evaluation pipelines

How to pick the LLM that grades your LLM. The cost-quality tradeoffs, the calibration check, and why a weaker judge is sometimes the right call.

Ground truth vs relevancy in RAG evaluation

Why ground truth and relevancy measure different things in RAG evals. When to use each, how to build both datasets, and the 2 metrics that matter most.

Hallucination testing for RAG pipelines

How to test a RAG pipeline for hallucinations systematically. Adversarial prompts, the out-of-scope set, and the metric that catches confabulation.

Testing and evaluating RAG pipelines end to end

How to test a RAG pipeline like real software. Unit, integration, and eval tests that catch regressions before they ship. The 3-layer test strategy.

Fact-checking RAG answers: grounding with verification

How to fact-check RAG answers with a second LLM pass that verifies every claim against the retrieved context. The prompt, the rejection rule, and the loop.

LLM-based content filtering for RAG pipelines

How to filter irrelevant retrieved chunks with a cheap LLM call before the final answer. The prompt, the batch pattern, and the 40 percent noise reduction.