LLM Observability with Arize Phoenix
Emit JSON logs with request IDs and OpenTelemetry spans across every tool call so you can click a trace and see exactly what your agent did.
Loading...
LLM observability is the sleep-at-night layer. Without it you don't know when your prompt regressed, your token costs doubled, or your RAG retrieval started picking garbage. With it, you ship changes weekly and know within minutes if something's off.
Curated by Param Harrison
These courses focus on the open-source stack most engineers use: Phoenix for tracing, evaluation patterns you can run in CI, cost dashboards, and the minimum instrumentation needed to debug a failing agent or a drifting embedding index.
Showing 7 of 7 courses
Common questions
Not to start. Arize Phoenix runs locally or self-hosted and covers most of what small teams need. Commercial tools become worth it at scale for eval datasets, team workflows, and SOC compliance.
Pick a fixed test set, define measurable output properties (exact match, rubric score, semantic similarity, tool-call sequence), run against every version of your prompt or chain, and alert when a metric regresses. The course builds this from scratch.
One line of tracing per LLM call + a weekly eval run on 20 fixture questions. That alone catches ~80% of regressions. Add cost tracking and alerting after you’ve been burned once.
Every LLM SDK returns usage metadata per call. Log it, tag by feature, dashboard the weekly total. The course covers the Phoenix setup plus a simple Postgres + Grafana alternative.
The day you ship to real users. Before that, local tracing is plenty.