Why FastRTC instead of LiveKit or Twilio Media Streams?

FastRTC keeps the entire voice loop inside a single FastAPI process. No separate media server, no extra signaling hop, no SIP trunk to configure. LiveKit is excellent when you need multi-participant rooms and server-side recording, and you will understand exactly when to reach for it after this course.

Do I need a phone number or SIP trunk?

No. The agent answers from a browser over WebRTC, which is how most production voice UX starts today. You can add a SIP bridge later, but the hard part is the audio loop and the latency budget, not the phone number.

Which LLM and TTS providers are used?

The default LLM is whatever OpenRouter routes you to, and you can swap in Gemini, Fireworks, or OpenAI with one environment variable. STT uses OpenAI Whisper when a key is present, with a mock fallback so the server boots with zero keys. TTS is a swappable seam so you can plug in ElevenLabs, Deepgram Aura, or Edge-TTS.

How is this different from the LiveKit voice agent course?

The LiveKit course shows you how to build agents on top of a managed real-time infrastructure. This course goes one layer down and shows you how the real-time loop actually works, inside one process, so you can debug latency and interruption behavior instead of treating them as magic.

47% OFFYearly Pro

$30/mo$16/mobilled yearlyGet Pro

47% OFFYearly Pro$30/mo$16/mobilled yearlyGet Pro

47% OFFYearly Pro

$30/mo$16/mobilled yearlyGet Pro

47% OFFYearly Pro$30/mo$16/mobilled yearlyGet Pro

Premium course

Answer the phone with a voice agent you actually understand

Name: Real-time phone agents with FastRTC
Price: 24 USD
Availability: InStock

Build a real-time phone agent in one FastAPI process. Stream audio over WebRTC with FastRTC, transcribe with Whisper, think with your LLM of choice, speak back with streaming TTS, and handle barge-in like a human would.

Enroll Preview curriculum

Still deciding? Ask first.

Message a mentor about fit, prerequisites, or where to start. Replies come on WhatsApp, usually within a day.

Curriculum fit, prerequisites, or where to start
Honest answer, no pressure to enroll

Engineers are learning here from

NVIDIAMICROSOFTGRABWISEPIPEDRIVEBOLTGLIA

Build a real-time phone voice agent that streams low-latency audio over WebRTC using FastRTC, pipes it through an LLM, and speaks back with natural-sounding TTS. Single FastAPI process, swappable providers, barge-in aware.

Ship a real-time phone voice agent over WebRTC with FastRTC, Whisper, and swappable TTS.

What you'll ship

Real projects, not toy demos.

A FastAPI service that mounts a FastRTC Stream for low-latency audio over WebRTC
A voice agent loop that runs audio in, STT, LLM, TTS, and audio out in a single process
Whisper streaming transcription with a voice activity detector gating the LLM call
A chunked TTS seam that starts playback before full synthesis finishes
A phone persona prompt that keeps replies under twenty spoken words
Mid-call tool calls for flight lookups without creating dead air
Barge-in handling with ReplyOnPause that cancels the agent when the caller speaks

What you'll learn

You finish able to:

Mount a FastRTC Stream onto FastAPI and complete a signaling handshake from the browser
Reason about the voice loop latency budget across STT, LLM, TTS, and network legs
Plug Whisper streaming into the transcribe seam with VAD gating and partial transcripts
Wire a chunked TTS provider that starts playback before synthesis completes
Write a phone persona prompt that produces short, warm, spoken replies
Call tools mid-turn without stranding the caller in silence
Detect barge-in with ReplyOnPause and reset agent state cleanly when the caller interrupts

Curriculum

From a health check to a real-time phone agent that handles interruptions.

01
First connection
Boot the FastAPI app, confirm the FastRTC signaling path, and keep the channel alive
3 lessons
02
Text turn baseline
Run the whole agent loop in text mode first, then do the latency math the audio loop has to hit
2 lessons
03
WebRTC mount
Stream real audio frames through FastRTC and understand the SDP and ICE dance that makes it possible
3 lessons
04
STT seam
Plug Whisper streaming into the transcribe seam, gate it with VAD, and use partial transcripts to start thinking early
3 lessons
05
TTS seam
Wire a real TTS provider into the synthesize seam and stream chunks so playback starts fast
3 lessons
06
Phone persona
Shape a system prompt that produces short spoken replies and add prosody hints that make the agent feel human
3 lessons
07
Tool calling
Let the agent look up flight status mid-call and cover the lookup latency with a holder phrase so the caller never hears dead air
3 lessons
08
Barge-in and turn detection
Let the caller interrupt mid-reply, cancel the agent cleanly, and reset state so the next turn is coherent
2 lessons

Who it's for

Is this for you?

Backend engineers

who built a streaming chatbot and now need to answer an actual phone call

AI engineers

who understand LLMs but have never shipped real-time audio over WebRTC

Voice product builders

who want one Python process instead of stitching together Twilio, a media server, and a separate agent host

FAQ

Common questions.

Why FastRTC instead of LiveKit or Twilio Media Streams?
FastRTC keeps the entire voice loop inside a single FastAPI process. No separate media server, no extra signaling hop, no SIP trunk to configure. LiveKit is excellent when you need multi-participant rooms and server-side recording, and you will understand exactly when to reach for it after this course.
Do I need a phone number or SIP trunk?
No. The agent answers from a browser over WebRTC, which is how most production voice UX starts today. You can add a SIP bridge later, but the hard part is the audio loop and the latency budget, not the phone number.
Which LLM and TTS providers are used?
The default LLM is whatever OpenRouter routes you to, and you can swap in Gemini, Fireworks, or OpenAI with one environment variable. STT uses OpenAI Whisper when a key is present, with a mock fallback so the server boots with zero keys. TTS is a swappable seam so you can plug in ElevenLabs, Deepgram Aura, or Edge-TTS.
How is this different from the LiveKit voice agent course?
The LiveKit course shows you how to build agents on top of a managed real-time infrastructure. This course goes one layer down and shows you how the real-time loop actually works, inside one process, so you can debug latency and interruption behavior instead of treating them as magic.

Pricing

Unlock this course with Pro.

One subscription unlocks every paid course and workshop replay. Pick yearly or monthly.

Unlock with Pro

$30$16/mo

You save 47% with regional pricing

Billed annually. Cancel anytime.

This course plus every paid course
Workshop replays in your library
New releases the day they ship

Still deciding?

After this course:

Voice is a loop, not a feature list. Build the loop once.

Enroll

Real-time phone agents with FastRTC

From $16/mo with Pro

47% OFFYearly Pro

$30/mo$16/mobilled yearlyGet Pro

47% OFFYearly Pro$30/mo$16/mobilled yearlyGet Pro

47% OFFYearly Pro

$30/mo$16/mobilled yearlyGet Pro

47% OFFYearly Pro$30/mo$16/mobilled yearlyGet Pro

Premium course

Answer the phone with a voice agent you actually understand

Enroll Preview curriculum

Still deciding? Ask first.

Message a mentor about fit, prerequisites, or where to start. Replies come on WhatsApp, usually within a day.

Curriculum fit, prerequisites, or where to start
Honest answer, no pressure to enroll

Engineers are learning here from

NVIDIAMICROSOFTGRABWISEPIPEDRIVEBOLTGLIA

Ship a real-time phone voice agent over WebRTC with FastRTC, Whisper, and swappable TTS.

What you'll ship

Real projects, not toy demos.

A FastAPI service that mounts a FastRTC Stream for low-latency audio over WebRTC
A voice agent loop that runs audio in, STT, LLM, TTS, and audio out in a single process
Whisper streaming transcription with a voice activity detector gating the LLM call
A chunked TTS seam that starts playback before full synthesis finishes
A phone persona prompt that keeps replies under twenty spoken words
Mid-call tool calls for flight lookups without creating dead air
Barge-in handling with ReplyOnPause that cancels the agent when the caller speaks

What you'll learn

You finish able to:

Mount a FastRTC Stream onto FastAPI and complete a signaling handshake from the browser
Reason about the voice loop latency budget across STT, LLM, TTS, and network legs
Plug Whisper streaming into the transcribe seam with VAD gating and partial transcripts
Wire a chunked TTS provider that starts playback before synthesis completes
Write a phone persona prompt that produces short, warm, spoken replies
Call tools mid-turn without stranding the caller in silence
Detect barge-in with ReplyOnPause and reset agent state cleanly when the caller interrupts

Curriculum

From a health check to a real-time phone agent that handles interruptions.

01
First connection
Boot the FastAPI app, confirm the FastRTC signaling path, and keep the channel alive
3 lessons
02
Text turn baseline
Run the whole agent loop in text mode first, then do the latency math the audio loop has to hit
2 lessons
03
WebRTC mount
Stream real audio frames through FastRTC and understand the SDP and ICE dance that makes it possible
3 lessons
04
STT seam
Plug Whisper streaming into the transcribe seam, gate it with VAD, and use partial transcripts to start thinking early
3 lessons
05
TTS seam
Wire a real TTS provider into the synthesize seam and stream chunks so playback starts fast
3 lessons
06
Phone persona
Shape a system prompt that produces short spoken replies and add prosody hints that make the agent feel human
3 lessons
07
Tool calling
Let the agent look up flight status mid-call and cover the lookup latency with a holder phrase so the caller never hears dead air
3 lessons
08
Barge-in and turn detection
Let the caller interrupt mid-reply, cancel the agent cleanly, and reset state so the next turn is coherent
2 lessons

Who it's for

Is this for you?

Backend engineers

who built a streaming chatbot and now need to answer an actual phone call

AI engineers

who understand LLMs but have never shipped real-time audio over WebRTC

Voice product builders

who want one Python process instead of stitching together Twilio, a media server, and a separate agent host

FAQ

Common questions.

Why FastRTC instead of LiveKit or Twilio Media Streams?
FastRTC keeps the entire voice loop inside a single FastAPI process. No separate media server, no extra signaling hop, no SIP trunk to configure. LiveKit is excellent when you need multi-participant rooms and server-side recording, and you will understand exactly when to reach for it after this course.
Do I need a phone number or SIP trunk?
No. The agent answers from a browser over WebRTC, which is how most production voice UX starts today. You can add a SIP bridge later, but the hard part is the audio loop and the latency budget, not the phone number.
Which LLM and TTS providers are used?
The default LLM is whatever OpenRouter routes you to, and you can swap in Gemini, Fireworks, or OpenAI with one environment variable. STT uses OpenAI Whisper when a key is present, with a mock fallback so the server boots with zero keys. TTS is a swappable seam so you can plug in ElevenLabs, Deepgram Aura, or Edge-TTS.
How is this different from the LiveKit voice agent course?
The LiveKit course shows you how to build agents on top of a managed real-time infrastructure. This course goes one layer down and shows you how the real-time loop actually works, inside one process, so you can debug latency and interruption behavior instead of treating them as magic.

Pricing

Unlock this course with Pro.

One subscription unlocks every paid course and workshop replay. Pick yearly or monthly.

Unlock with Pro

$30$16/mo

You save 47% with regional pricing

Billed annually. Cancel anytime.

This course plus every paid course
Workshop replays in your library
New releases the day they ship

Still deciding?

After this course:

Voice is a loop, not a feature list. Build the loop once.

Enroll

Real-time phone agents with FastRTC

From $16/mo with Pro