Loading...
Loading...
Take a working medallion notebook and turn it into a scheduled, validated, API-fronted data platform that survives retries, backfills, and a real oncall rotation.
Message a mentor about fit, prerequisites, or where to start. Replies come on WhatsApp, usually within a day.
Engineers are learning here from
Graduate from laptop notebooks to a real platform. Orchestrate batch ingestion with Airflow, transform with Spark, land data in a Postgres star schema, validate with Great Expectations, and ship a FastAPI control plane that triggers runs and serves the warehouse. The middle tier of the learnwithparam data engineering track.
Orchestrate batch pipelines with Airflow and Spark, gate them with Great Expectations, and front the warehouse with a FastAPI control plane that triggers DAGs and serves datasets.
What you'll ship
What you'll learn
Curriculum
From laptop to platform
See why a notebook stops scaling and frame the four concerns a real data platform separates: orchestration, compute, storage, and control
Build the batch pipeline
Wire the Airflow DAG that extracts from MySQL, validates with Great Expectations, and transforms with a Spark job into a Postgres star schema
FastAPI control plane
Front the platform with a typed FastAPI service that triggers DAGs, exposes health, and reads from the warehouse using pydantic-settings
Who it's for
who finished the medallion notebook and now need to put it on a schedule without duct tape
maintaining a growing pile of cron jobs and want to move to a real orchestrator
wiring data ingestion for RAG and features and realizing cron plus bash cannot scale
asked to host data pipelines and wanting a reference stack that is teachable to the team
FAQ
No. The whole stack runs on docker-compose on a laptop. Airflow, Spark, Postgres, MinIO, FastAPI, and Great Expectations all spin up locally with one command. The shapes match production so the port is mostly mechanical when you graduate to Kubernetes or a managed service.
dbt handles the transform and tests, which is one slice of what we cover. Airflow owns orchestration, Spark owns heavy compute, and FastAPI owns the control plane. You can absolutely drop dbt in place of Spark-for-transform once the pattern is clear, and the rest of the platform still applies.
Yes. In production you will be asked "who triggered this run?", "why did this DAG fail?", and "is dataset X fresh?". A typed API that wraps these answers, with real health checks, saves every on-call week. The course walks through the 1:1 port from a .NET reference so you see the design choices.
Yes. Airflow DAGs, Spark jobs, and warehouse star schemas are the patterns. Managed services change the runtime, not the shapes. The course notes where the port differs and what you will change.
Pricing
One subscription unlocks every paid course and workshop replay. Pick yearly or monthly.
Unlock with Pro
You save 47% with regional pricing
Billed annually. Cancel anytime.
Still deciding? Ask Param a question
Production data pipelines with Airflow, Spark, and FastAPI
From $16/mo with Pro