The Multiple Comparisons Problem

When you run multiple hypothesis tests, the risk of at least one false positive grows fast. Each individual test with α = 0.05 has a 5% chance of a false positive. Run 20 independent tests and you have roughly a 64% chance of at least one false positive — even when no real effects exist.

This is the multiple comparisons problem, and you'll bump into it constantly:

Comparing each variant in a 10-arm A/B test against the control.
Testing whether each of 50 features is significantly associated with churn.
Comparing model performance across many cross-validation folds.
Running the same experiment in multiple geographic segments.

ℹ

Bonferroni Correction

The simplest and most conservative fix. Divide your significance threshold by the number of tests:

$\alpha_{adjusted} = \frac{\alpha}{m}$

where m is the number of hypotheses being tested. If you're running 5 tests with α = 0.05, each individual test must clear 0.05/5 = 0.01.

Trade-off: Bonferroni reduces power — by raising the bar, you'll miss more real effects. It's conservative by design. For very large numbers of tests, consider the Benjamini-Hochberg procedure instead, which controls the false discovery rate rather than the family-wise error rate.

Checkpoint

You're running an A/B/C/D/E test — five variants against a control (six arms total). You want to control your family-wise error rate at α = 0.05. What threshold should each individual pairwise comparison use with Bonferroni correction?

←PreviousP-Hacking and Why α Comes FirstThe Logic of Hypothesis Testing Next→The Practical QuestionStatistical Significance and Power Analysis