Unit 3

Data/ML Engineering

Where does your data actually live, how does it get to you, and how does the model you build ever escape your laptop? This unit covers data storage, data pipelines, and ML pipelines — the infrastructure layer that separates a simple class project from a production system.

Chapter 1

Data Storage

Where your data lives shapes every model decision you can make. This chapter covers relational databases, NoSQL, vector databases, warehouses, and data lakes, and builds a decision framework for choosing the right tool.

Chapter 2

Building Data Pipelines

Data engineering is the infrastructure that makes modeling possible. This chapter covers the ETL pattern, pipeline orchestration with DAGs, distributed processing with Spark, the major cloud warehouses, and how to design a pipeline from source to dashboard.

Chapter 3

Building ML Pipelines

Getting a model out of the notebook and into production is the second half of the ML engineer's job. This chapter covers the full MLOps lifecycle — Docker, deployment strategies, CI/CD, monitoring for data drift, scalability, cost, security, and building demos that get you hired.