47% OFFYearly Pro
$30/mo$16/mobilled yearlyGet Pro
Topic

LLM

Explore our latest articles and insights about LLM.

Explore posts

28 posts in total

LLM Engineering

Choosing the LLM judge for evaluation pipelines

How to pick the LLM that grades your LLM. The cost-quality tradeoffs, the calibration check, and why a weaker judge is sometimes the right call.

EvaluationLLM+3
Read post
8 min
LLM Engineering

Hallucination testing for RAG pipelines

How to test a RAG pipeline for hallucinations systematically. Adversarial prompts, the out-of-scope set, and the metric that catches confabulation.

RAGEvaluation+3
Read post
8 min
LLM Engineering

Fact-checking RAG answers: grounding with verification

How to fact-check RAG answers with a second LLM pass that verifies every claim against the retrieved context. The prompt, the rejection rule, and the loop.

RAGLLM+3
Read post
8 min
LLM Engineering

Query rewriting in RAG with LLMs: the rewrite loop

How LLM-powered query rewriting fixes vague user questions before retrieval. The prompt, the multi-query fan-out, and when rewriting hurts more than helps.

RAGLLM+3
Read post
8 min
LLM Engineering

LLM-based content filtering for RAG pipelines

How to filter irrelevant retrieved chunks with a cheap LLM call before the final answer. The prompt, the batch pattern, and the 40 percent noise reduction.

RAGLLM+3
Read post
8 min
AI Engineering in Practice

Agent cost optimization from trace data

How to use Langfuse trace data to find where your agent burns tokens. The 4 queries, the cost-per-user view, and the 50 percent savings patterns.

ObservabilityAI Agents+3
Read post
9 min
LLM Engineering

LLM judges: enforcing reasoning with explicit rationales

Why LLM judges without explicit reasoning drift, and how chain-of-thought rationales make their scores defensible. The prompt, the parser, the trust.

EvaluationLLM+3
Read post
9 min
LLM Engineering

LLM-as-a-judge: production evaluation framework for agents

How to build an LLM-as-a-judge evaluation framework for agentic AI. The prompt, the rubric, the bias controls, and the loop that catches regressions.

EvaluationLLM+3
Read post
9 min
AI Engineering

Circuit breakers for LLM calls: stop cascading failures

How circuit breakers prevent LLM outages from cascading through your agent. The 3 states, the failure window, and the 50-line implementation.

AI AgentsLLM+3
Read post
11 min
AI Engineering in Practice

Resilient LLM services with Tenacity and fallback models

How to survive LLM provider outages with Tenacity retries and fallback models. The retry policy, the fallback chain, and the 60-line pattern.

AI AgentsLLM+3
Read post
9 min
AI Engineering

Context window management for production AI agents

How to manage context windows in production AI agents. The 4 strategies that keep long sessions bounded without losing critical context.

AI AgentsLLM+3
Read post
11 min
LLM Engineering

Chain-of-thought reasoning in RAG: a practical guide

How to add chain-of-thought reasoning to a RAG pipeline. The prompt, the parsing, and the cases where CoT beats a straight answer by a wide margin.

RAGLLM+3
Read post
11 min

Weekly Bytes of AI

Technical deep-dives for engineers building production AI systems.

Architecture patterns, system design, cost optimization, and real-world case studies. No fluff, just engineering insights.

Unsubscribe anytime. We respect your inbox.