Chapter 2
Building Data Pipelines
Data engineering is the infrastructure that makes modeling possible. This chapter covers the ETL pattern, pipeline orchestration with DAGs, distributed processing with Spark, the major cloud warehouses, and how to design a pipeline from source to dashboard.