Statistical Power
Statistical power is the probability that your study will detect an effect when there really is an effect to detect. Formally: power = 1 − β, where β is the probability of a Type 2 error.
If your study is underpowered, you may very well miss real effects — and worse, you may ship the conclusion that there's no effect when there is one. An underpowered study isn't just inconclusive; it's actively misleading.
The Four Interlocking Components
Power analysis sits on four numbers that are all interconnected. Specify any three, and the fourth is determined:
- Effect size — the magnitude of the difference you're trying to detect.
- Sample size — the number of observations.
- Significance level (α) — the probability of a Type 1 error (usually 0.05).
- Power (1 − β) — the probability of avoiding a Type 2 error (usually 0.80).
The most common scenario: specify effect size, α, and power → solve for minimum sample size.
The indigo curve is the null distribution (H₀); the green curve is the alternative (H₁). The dashed line is the critical value. Adjust the sliders to see how power changes.
Adjust effect size, sample size, and α to see how the null (H₀) and alternative (H₁) distributions shift relative to the critical value. The green shaded region is power; the red region is β.
Power gets higher when:
- Sample size increases. More data = more power. This is usually your main lever.
- Effect size is larger. Big effects are easier to detect.
Power gets lower when:
- The significance threshold becomes stricter. Moving from α = 0.05 to α = 0.01 makes it harder to clear the bar.
- Variability in the data increases. Noisy data masks signal.
A study has 80% power at α = 0.05 to detect an effect of size d = 0.5. A researcher decides to use a stricter significance threshold of α = 0.01. All else equal, what happens to power?