Unit 1

Data Storytelling

From raw information to actionable insight: how data is represented, sourced, explored, visualized, and prepared for modeling — and the ethical responsibilities that come with it.

Chapter 1

Information Representation

Every piece of data your model will ever see — images, text, sensor readings, user clicks — must live inside a computer as numbers. This chapter explains why that's true, why it's harder than it sounds, and how to make smart choices when converting the messy real world into numeric representations.

Chapter 2

Sourcing Data

Where does data come from — and what tradeoffs did you accept the moment you chose that source? This chapter covers surveys, web scraping, user data, APIs, and sensors, then builds the organizational discipline (IRB, bias mitigation, documentation) that separates usable datasets from data swamps.

Chapter 3

Exploratory Data Analysis

EDA is the bridge between 'we have data' and 'we can model.' This chapter teaches you the questions to ask, the statistics to compute, the patterns to look for, and how to turn a complete EDA pass into a concrete modeling plan.

Chapter 4

Telling Stories with Data

Producing good insights is only half the job. Delivering them to people who need to make decisions is the other half — and it requires a different set of skills. This chapter covers visualization choice, audience design, and the three-part anatomy of a data story that actually lands.

Chapter 5

Preparing Data for Modeling

The chapter that decides whether your model works in the real world or only in your notebook. Learn why three splits beat two, how cross-validation works, and — most importantly — how data leakage quietly destroys models that look great in evaluation.

Chapter 6

Preprocessing

Before any complex modeling, you have to make your data fit to model on. This chapter covers missing features, missing values (MCAR/MAR/MNAR), outlier detection and handling, data transformations, and the structured quality assessment process that separates professional data work from notebook hacking.

Chapter 7

Feature Engineering

The highest-leverage thing you can do for model performance — and the place where domain expertise pays off more than anywhere else in the pipeline. This chapter covers feature engineering for tabular, time series, image, and text data, plus dimensionality reduction and feature selection.

Chapter 8

Data Risks, Bias, & Ethics

The last chapter and the most important. Everything we've covered in this unit is technical. This chapter is where the technical and the ethical fuse together — covering bias sources, fairness frameworks, explainability, privacy law, and the practitioner's responsibility when shipping AI systems that affect real people.