What does a data engineer do now that AI exists?

Same core job (move and shape data) plus a new layer: embeddings, vector stores, retrieval quality, and eval infrastructure. Teams expect you to own the data path into LLM pipelines, not just analytical warehouses.

Do I need Airflow or dbt?

No. These courses stay Python-first (cron + scripts + typed pipelines) because that’s what most AI-focused teams run. Airflow and dbt are worth learning when you’re at a specific scale, but they’re not prerequisites here.

SQL gets you to analytical work. For AI pipelines you also need Python for embedding, chunking, retrieval, and evaluation. The SQL for Engineers course covers the SQL half; the RAG and Python courses cover the other half.

How is this different from the AI Engineer role?

Data engineers own the pipeline (ingestion, cleaning, indexing). AI engineers own the model interaction (prompts, agents, tool use). In small teams one person does both.

Best first course if I’m coming from analytics?

Start with Python for GenAI, then RAG Fundamentals. After that, the observability course helps you prove the retrieval layer you built actually works.

47% OFFYearly Pro

$30/mo$16/mobilled yearlyGet Pro

47% OFFYearly Pro$30/mo$16/mobilled yearlyGet Pro

Learning path

Data Engineer courses

Data engineers move data from where it lives to where it's useful. These courses focus on the Python-first tooling most AI-adjacent teams pick. There is no Spark-centric career track here. You'll learn pipelines, embeddings, RAG retrieval patterns, and the observability needed to trust what you ship.

Curated by Param Harrison

Every course leans practical: real datasets, real retrieval problems, real evaluation. If you've been piecing together AI systems from YouTube tutorials, this track gives you the scaffolding those videos skip.

Showing 7 of 7 courses

Ultimate PostgreSQL Bootcamp: Go from Beginner to Expert

From SELECT to production PostgreSQL mastery on 100k+ real e-commerce orders.

BeginnerPro

View course

Data engineering foundations with the medallion pattern

Bronze, silver, gold warehouse patterns with DuckDB and pandas, rehearsed on a real dataset before you touch Airflow or Spark.

BeginnerFree

Start learning

Production data pipelines with Airflow, Spark, and FastAPI

Orchestrate batch pipelines with Airflow and Spark, gate them with Great Expectations, and front the warehouse with a FastAPI control plane that triggers DAGs and serves datasets.

IntermediatePro

View course

Enterprise data platform from ingestion to governance

Run a twenty-service production data platform end to end: batch, streaming, warehouse, ML tracking, lineage, observability, and a typed FastAPI control plane.

AdvancedPro

View course

AWS Glue and PySpark ETL on a real flight dataset

Write a PySpark transform that runs identically locally and on AWS Glue 4.0. Ship via CodeBuild, validate with a local smoke run, and skip the surprise DPU bills.

IntermediatePro

View course

AWS lakehouse with Apache Iceberg, Glue, and Snowflake

Lambda to S3 to Glue PySpark to Iceberg in the Glue Data Catalog, queried by Snowflake as external tables. Airflow orchestrates. Terraform provisions. CodeBuild ships.

IntermediatePro

View course

GCP analytics with BigQuery, dbt, and Cloud Run

GCS to BigQuery to dbt star schema, with Cloud Run hosting the dbt runner and Cloud Build deploying on push. Airflow schedules the run. Terraform owns the infra.

IntermediatePro

View course

Common questions

Data Engineer: quick answers

What does a data engineer do now that AI exists?
Same core job (move and shape data) plus a new layer: embeddings, vector stores, retrieval quality, and eval infrastructure. Teams expect you to own the data path into LLM pipelines, not just analytical warehouses.
Do I need Airflow or dbt?
No. These courses stay Python-first (cron + scripts + typed pipelines) because that’s what most AI-focused teams run. Airflow and dbt are worth learning when you’re at a specific scale, but they’re not prerequisites here.
Is SQL enough?
SQL gets you to analytical work. For AI pipelines you also need Python for embedding, chunking, retrieval, and evaluation. The SQL for Engineers course covers the SQL half; the RAG and Python courses cover the other half.
How is this different from the AI Engineer role?
Data engineers own the pipeline (ingestion, cleaning, indexing). AI engineers own the model interaction (prompts, agents, tool use). In small teams one person does both.
Best first course if I’m coming from analytics?
Start with Python for GenAI, then RAG Fundamentals. After that, the observability course helps you prove the retrieval layer you built actually works.

Or browse every course

Data Engineer courses

Ultimate PostgreSQL Bootcamp: Go from Beginner to Expert

Data engineering foundations with the medallion pattern

Production data pipelines with Airflow, Spark, and FastAPI

Enterprise data platform from ingestion to governance

AWS Glue and PySpark ETL on a real flight dataset

AWS lakehouse with Apache Iceberg, Glue, and Snowflake

GCP analytics with BigQuery, dbt, and Cloud Run

Data Engineer: quick answers

What does a data engineer do now that AI exists?

Do I need Airflow or dbt?

Is SQL enough?

How is this different from the AI Engineer role?

Best first course if I’m coming from analytics?

Related paths

Data Engineer courses

Ultimate PostgreSQL Bootcamp: Go from Beginner to Expert

Data engineering foundations with the medallion pattern

Production data pipelines with Airflow, Spark, and FastAPI

Enterprise data platform from ingestion to governance

AWS Glue and PySpark ETL on a real flight dataset

AWS lakehouse with Apache Iceberg, Glue, and Snowflake

GCP analytics with BigQuery, dbt, and Cloud Run

Data Engineer: quick answers

What does a data engineer do now that AI exists?

Do I need Airflow or dbt?

Is SQL enough?

How is this different from the AI Engineer role?

Best first course if I’m coming from analytics?

Related paths

Data Engineer courses

Create your free account

Ultimate PostgreSQL Bootcamp: Go from Beginner to Expert

Data engineering foundations with the medallion pattern

Production data pipelines with Airflow, Spark, and FastAPI

Enterprise data platform from ingestion to governance

AWS Glue and PySpark ETL on a real flight dataset

AWS lakehouse with Apache Iceberg, Glue, and Snowflake

GCP analytics with BigQuery, dbt, and Cloud Run

Data Engineer: quick answers

What does a data engineer do now that AI exists?

Do I need Airflow or dbt?

Is SQL enough?

How is this different from the AI Engineer role?

Best first course if I’m coming from analytics?

Related paths

Data Engineer courses

Create your free account

Ultimate PostgreSQL Bootcamp: Go from Beginner to Expert

Data engineering foundations with the medallion pattern

Production data pipelines with Airflow, Spark, and FastAPI

Enterprise data platform from ingestion to governance

AWS Glue and PySpark ETL on a real flight dataset

AWS lakehouse with Apache Iceberg, Glue, and Snowflake

GCP analytics with BigQuery, dbt, and Cloud Run

Data Engineer: quick answers

What does a data engineer do now that AI exists?

Do I need Airflow or dbt?

Is SQL enough?

How is this different from the AI Engineer role?

Best first course if I’m coming from analytics?

Related paths