Do I need a real cluster or cloud credits for this?

No. The whole stack runs on docker-compose on a laptop. Airflow, Spark, Postgres, MinIO, FastAPI, and Great Expectations all spin up locally with one command. The shapes match production so the port is mostly mechanical when you graduate to Kubernetes or a managed service.

How is this different from just using dbt?

dbt handles the transform and tests, which is one slice of what we cover. Airflow owns orchestration, Spark owns heavy compute, and FastAPI owns the control plane. You can absolutely drop dbt in place of Spark-for-transform once the pattern is clear, and the rest of the platform still applies.

Is the FastAPI control plane really needed?

Yes. In production you will be asked "who triggered this run?", "why did this DAG fail?", and "is dataset X fresh?". A typed API that wraps these answers, with real health checks, saves every on-call week. The course walks through the 1:1 port from a .NET reference so you see the design choices.

Will the skills transfer to Snowflake, Databricks, or BigQuery?

Yes. Airflow DAGs, Spark jobs, and warehouse star schemas are the patterns. Managed services change the runtime, not the shapes. The course notes where the port differs and what you will change.

47% OFFYearly Pro

$30/mo$16/mobilled yearlyGet Pro

47% OFFYearly Pro$30/mo$16/mobilled yearlyGet Pro

47% OFFYearly Pro

$30/mo$16/mobilled yearlyGet Pro

47% OFFYearly Pro$30/mo$16/mobilled yearlyGet Pro

Premium course

Production data pipelines with Airflow, Spark, and FastAPI

Name: Production data pipelines with Airflow, Spark, and FastAPI
Price: 49 USD
Availability: InStock

Take a working medallion notebook and turn it into a scheduled, validated, API-fronted data platform that survives retries, backfills, and a real oncall rotation.

Enroll Preview curriculum

Still deciding? Ask first.

Message a mentor about fit, prerequisites, or where to start. Replies come on WhatsApp, usually within a day.

Curriculum fit, prerequisites, or where to start
Honest answer, no pressure to enroll

Engineers are learning here from

NVIDIAMICROSOFTGRABWISEPIPEDRIVEBOLTGLIA

Graduate from laptop notebooks to a real platform. Orchestrate batch ingestion with Airflow, transform with Spark, land data in a Postgres star schema, validate with Great Expectations, and ship a FastAPI control plane that triggers runs and serves the warehouse. The middle tier of the learnwithparam data engineering track.

Orchestrate batch pipelines with Airflow and Spark, gate them with Great Expectations, and front the warehouse with a FastAPI control plane that triggers DAGs and serves datasets.

What you'll ship

Real projects, not toy demos.

An Airflow DAG that extracts from MySQL, lands raw files in MinIO, and writes a run log to Postgres
A Spark batch job that reads MinIO parquet, models a star schema, and loads Postgres
A Great Expectations suite that gates the transform on row counts, nulls, and domain values
A FastAPI service with typed pydantic-settings config, async health checks, and a /pipeline/trigger endpoint
A docker-compose stack that runs the whole platform on a laptop so you can rehearse before touching cloud
A smoke test that curls the control plane and verifies every upstream degrades gracefully

What you'll learn

You finish able to:

Translate a notebook pipeline into Airflow DAGs with explicit task boundaries, retries, and SLAs
Write Spark batch jobs that read from MinIO parquet and land into a Postgres star schema
Author Great Expectations suites that catch row-count, null, and domain drift before gold updates
Ship a FastAPI control plane with pydantic-settings config and async health checks that degrade gracefully
Wire docker-compose so the platform runs on a laptop with the same shape as production
Reason about retries, backfills, and failure modes so pager rotations do not end your weekends

Curriculum

From medallion notebook to a running data platform: Airflow, Spark, Postgres, MinIO, Great Expectations, and FastAPI.

01
From laptop to platform
See why a notebook stops scaling and frame the four concerns a real data platform separates: orchestration, compute, storage, and control
3 lessons
02
Build the batch pipeline
Wire the Airflow DAG that extracts from MySQL, validates with Great Expectations, and transforms with a Spark job into a Postgres star schema
3 lessons
03
FastAPI control plane
Front the platform with a typed FastAPI service that triggers DAGs, exposes health, and reads from the warehouse using pydantic-settings
3 lessons

Who it's for

Is this for you?

Backend engineers

who finished the medallion notebook and now need to put it on a schedule without duct tape

Analytics engineers

maintaining a growing pile of cron jobs and want to move to a real orchestrator

AI engineers

wiring data ingestion for RAG and features and realizing cron plus bash cannot scale

Platform engineers

asked to host data pipelines and wanting a reference stack that is teachable to the team

FAQ

Common questions.

Do I need a real cluster or cloud credits for this?
No. The whole stack runs on docker-compose on a laptop. Airflow, Spark, Postgres, MinIO, FastAPI, and Great Expectations all spin up locally with one command. The shapes match production so the port is mostly mechanical when you graduate to Kubernetes or a managed service.
How is this different from just using dbt?
dbt handles the transform and tests, which is one slice of what we cover. Airflow owns orchestration, Spark owns heavy compute, and FastAPI owns the control plane. You can absolutely drop dbt in place of Spark-for-transform once the pattern is clear, and the rest of the platform still applies.
Is the FastAPI control plane really needed?
Yes. In production you will be asked "who triggered this run?", "why did this DAG fail?", and "is dataset X fresh?". A typed API that wraps these answers, with real health checks, saves every on-call week. The course walks through the 1:1 port from a .NET reference so you see the design choices.
Will the skills transfer to Snowflake, Databricks, or BigQuery?
Yes. Airflow DAGs, Spark jobs, and warehouse star schemas are the patterns. Managed services change the runtime, not the shapes. The course notes where the port differs and what you will change.

Pricing

Unlock this course with Pro.

One subscription unlocks every paid course and workshop replay. Pick yearly or monthly.

Unlock with Pro

$30$16/mo

You save 47% with regional pricing

Billed annually. Cancel anytime.

This course plus every paid course
Workshop replays in your library
New releases the day they ship

Still deciding? Ask Param a question

After this course:

Rehearse the whole stack on docker-compose. Keep what you build.

Enroll

Production data pipelines with Airflow, Spark, and FastAPI

From $16/mo with Pro

47% OFFYearly Pro

$30/mo$16/mobilled yearlyGet Pro

47% OFFYearly Pro$30/mo$16/mobilled yearlyGet Pro

47% OFFYearly Pro

$30/mo$16/mobilled yearlyGet Pro

47% OFFYearly Pro$30/mo$16/mobilled yearlyGet Pro

Premium course

Production data pipelines with Airflow, Spark, and FastAPI

Take a working medallion notebook and turn it into a scheduled, validated, API-fronted data platform that survives retries, backfills, and a real oncall rotation.

Enroll Preview curriculum

Still deciding? Ask first.

Message a mentor about fit, prerequisites, or where to start. Replies come on WhatsApp, usually within a day.

Curriculum fit, prerequisites, or where to start
Honest answer, no pressure to enroll

Engineers are learning here from

NVIDIAMICROSOFTGRABWISEPIPEDRIVEBOLTGLIA

Orchestrate batch pipelines with Airflow and Spark, gate them with Great Expectations, and front the warehouse with a FastAPI control plane that triggers DAGs and serves datasets.

What you'll ship

Real projects, not toy demos.

An Airflow DAG that extracts from MySQL, lands raw files in MinIO, and writes a run log to Postgres
A Spark batch job that reads MinIO parquet, models a star schema, and loads Postgres
A Great Expectations suite that gates the transform on row counts, nulls, and domain values
A FastAPI service with typed pydantic-settings config, async health checks, and a /pipeline/trigger endpoint
A docker-compose stack that runs the whole platform on a laptop so you can rehearse before touching cloud
A smoke test that curls the control plane and verifies every upstream degrades gracefully

What you'll learn

You finish able to:

Translate a notebook pipeline into Airflow DAGs with explicit task boundaries, retries, and SLAs
Write Spark batch jobs that read from MinIO parquet and land into a Postgres star schema
Author Great Expectations suites that catch row-count, null, and domain drift before gold updates
Ship a FastAPI control plane with pydantic-settings config and async health checks that degrade gracefully
Wire docker-compose so the platform runs on a laptop with the same shape as production
Reason about retries, backfills, and failure modes so pager rotations do not end your weekends

Curriculum

From medallion notebook to a running data platform: Airflow, Spark, Postgres, MinIO, Great Expectations, and FastAPI.

01
From laptop to platform
See why a notebook stops scaling and frame the four concerns a real data platform separates: orchestration, compute, storage, and control
3 lessons
02
Build the batch pipeline
Wire the Airflow DAG that extracts from MySQL, validates with Great Expectations, and transforms with a Spark job into a Postgres star schema
3 lessons
03
FastAPI control plane
Front the platform with a typed FastAPI service that triggers DAGs, exposes health, and reads from the warehouse using pydantic-settings
3 lessons

Who it's for

Is this for you?

Backend engineers

who finished the medallion notebook and now need to put it on a schedule without duct tape

Analytics engineers

maintaining a growing pile of cron jobs and want to move to a real orchestrator

AI engineers

wiring data ingestion for RAG and features and realizing cron plus bash cannot scale

Platform engineers

asked to host data pipelines and wanting a reference stack that is teachable to the team

FAQ

Common questions.

Do I need a real cluster or cloud credits for this?
No. The whole stack runs on docker-compose on a laptop. Airflow, Spark, Postgres, MinIO, FastAPI, and Great Expectations all spin up locally with one command. The shapes match production so the port is mostly mechanical when you graduate to Kubernetes or a managed service.
How is this different from just using dbt?
dbt handles the transform and tests, which is one slice of what we cover. Airflow owns orchestration, Spark owns heavy compute, and FastAPI owns the control plane. You can absolutely drop dbt in place of Spark-for-transform once the pattern is clear, and the rest of the platform still applies.
Is the FastAPI control plane really needed?
Yes. In production you will be asked "who triggered this run?", "why did this DAG fail?", and "is dataset X fresh?". A typed API that wraps these answers, with real health checks, saves every on-call week. The course walks through the 1:1 port from a .NET reference so you see the design choices.
Will the skills transfer to Snowflake, Databricks, or BigQuery?
Yes. Airflow DAGs, Spark jobs, and warehouse star schemas are the patterns. Managed services change the runtime, not the shapes. The course notes where the port differs and what you will change.

Pricing

Unlock this course with Pro.

One subscription unlocks every paid course and workshop replay. Pick yearly or monthly.

Unlock with Pro

$30$16/mo

You save 47% with regional pricing

Billed annually. Cancel anytime.

This course plus every paid course
Workshop replays in your library
New releases the day they ship

Still deciding? Ask Param a question

After this course:

Rehearse the whole stack on docker-compose. Keep what you build.

Enroll

Production data pipelines with Airflow, Spark, and FastAPI

From $16/mo with Pro