Statistical Power

Statistical power is the probability that your study will detect an effect when there really is an effect to detect. Formally: power = 1 − β, where β is the probability of a Type 2 error.

If your study is underpowered, you may very well miss real effects — and worse, you may ship the conclusion that there's no effect when there is one. An underpowered study isn't just inconclusive; it's actively misleading.

The Four Interlocking Components

Power analysis sits on four numbers that are all interconnected. Specify any three, and the fourth is determined:

  1. Effect size — the magnitude of the difference you're trying to detect.
  2. Sample size — the number of observations.
  3. Significance level (α) — the probability of a Type 1 error (usually 0.05).
  4. Power (1 − β) — the probability of avoiding a Type 2 error (usually 0.80).

The most common scenario: specify effect size, α, and power → solve for minimum sample size.

Power Analysis Explorer

The indigo curve is the null distribution (H₀); the green curve is the alternative (H₁). The dashed line is the critical value. Adjust the sliders to see how power changes.

H₀H₁
H₀ (null)H₁ (alternative)Power (1 − β)β (Type II error)α (Type I error)Critical value
Effect size (Cohen's d)0.50
0.101.00
Sample size (n)40
5300
Significance level (α)
Power (1 − β)
93.5%
Adequate — meets the conventional 80% threshold
0.50
Effect size (d)
40
Sample size
93.5%
Power
6.5%
β (miss rate)

Adjust effect size, sample size, and α to see how the null (H₀) and alternative (H₁) distributions shift relative to the critical value. The green shaded region is power; the red region is β.

Power gets higher when:

  • Sample size increases. More data = more power. This is usually your main lever.
  • Effect size is larger. Big effects are easier to detect.

Power gets lower when:

  • The significance threshold becomes stricter. Moving from α = 0.05 to α = 0.01 makes it harder to clear the bar.
  • Variability in the data increases. Noisy data masks signal.
Checkpoint

A study has 80% power at α = 0.05 to detect an effect of size d = 0.5. A researcher decides to use a stricter significance threshold of α = 0.01. All else equal, what happens to power?