Simpson's Paradox

One of the most counterintuitive (and important!) phenomena in applied statistics is Simpson's Paradox.

Simpson's Paradox occurs when a trend or relationship visible in subgroups of data disappears or reverses when the subgroups are aggregated. A drug appears effective in men, appears effective in women, and appears ineffective in the combined dataset. An algorithm appears better than another within each user segment but worse overall.

◆

A Famous Real Example: UC Berkeley Admissions (1973)

Aggregate data showed men were admitted at a higher rate than women (44% vs. 35%), suggesting gender bias. But when broken down by department, most departments showed women being admitted at equal or higher rates than men.

The resolution: women disproportionately applied to more competitive departments (lower overall admission rates). The aggregate rate was driven by department selection, not discrimination within departments.

⚠

What Causes It

Confounding variables that influence both the predictor and the outcome differently across subgroups.
Group heterogeneity — differences in the size or composition of subgroups pull the aggregate trend in a different direction than within-group trends.

Always run subgroup analysis alongside aggregate analysis. Visualize relationships within meaningful slices of your data before declaring the aggregate result. This is part of what makes good exploratory data analysis so much more than just "looking at the data."

Simpson's Paradox — UC Berkeley Admissions (1973)

Aggregate analysis

Overall, men were admitted at 46% vs. 30% for women — suggesting bias. But the aggregate hides a confounding variable. Click "Break down by department" to see why.

Why it happens

Simpson's Paradox arises when a confounding variable is correlated with both the grouping and the outcome. Here, women self-selected into harder departments — so the aggregate rate reflects department difficulty, not bias. Always ask: is there a lurking variable that determines who ends up in which group?

MenWomen

Display a Simpson's Paradox scenario — show the aggregate trend and the per-subgroup trends side by side. Adjust the subgroup sizes to watch the paradox emerge and disappear.

Checkpoint

An analysis shows that among all customers, those who used the chat support feature had a lower 30-day retention rate than those who didn't. But within each customer tier (new, returning, VIP), chat users have higher retention. What is the most likely explanation?

←PreviousResidual Analysis and Confidence IntervalsModel Evaluation Next→The Evaluation ChecklistModel Evaluation