Building voice AI agents with LiveKit and Deepgram
Build a real-time voice AI agent with LiveKit, Deepgram, and tool calling.
Loading...
Voice AI sits at the intersection of three things that are each hard alone: real-time audio, speech recognition, and agent logic that responds in under a second. When any one of those drifts, the whole experience feels broken. The engineering is latency budgets, turn detection, context preservation across handoffs, and knowing when to interrupt.
Curated by Param Harrison
These courses cover the full stack most voice AI teams use: LiveKit and FastRTC for transport, Whisper and Deepgram for STT, TTS providers for the response voice, and agent frameworks that tie it together. You build things that sound like real products, not demo reels, with specific patterns for multi-agent voice triage and phone-line integration.
Showing 4 of 4 courses
Common questions
Not to start. LiveKit and FastRTC let you build and test voice agents entirely in the browser with WebRTC. Adding a phone number (Twilio or Telnyx) comes later. The real-time phone agents course shows that integration explicitly.
Whisper is the open-source baseline. Runs locally, no per-minute cost, lower accuracy on noisy audio. Deepgram is faster and more accurate on real-world audio but adds a provider dependency. Voice agents in production usually pick Deepgram; indie projects often stick with Whisper.
Turn detection. Each framework ships a voice activity detector that pauses TTS the moment it hears speech. The LiveKit courses cover the specific knobs (silence thresholds, interruption handling) that make conversations feel natural instead of rude.
Yes, and this is where voice AI gets interesting. The multi-agent voice systems course walks through a triage-to-specialist handoff where context and audio session stay continuous. The user never hears the seam.
Aim for under 800 ms end-to-end for a natural feel. That is STT plus LLM plus TTS plus network. Streaming helps a lot on the LLM side, and keeping the graph shallow keeps total time predictable.