Determining Effect Size

The hardest part of a power analysis is the effect size. We usually set significance level and power to the defaults: α to 0.05 and power to 0.80. Effect size is the last remaining piece. Three common ways to estimate it:

◆

Method 1: Pilot Study

Run a small feasibility study first. Observe the effect size from the pilot, then use it to plan the full study. This is the most rigorous approach when you can afford it — you're grounding the power analysis in actual data from the real system.

◆

Method 2: Literature Review

Find prior studies on closely related questions and use their reported effect sizes as your estimate. This is the most common source in academic research and genuinely useful in industry when similar work has been done before — competitor analyses, prior product experiments, published benchmarks.

◆

Method 3: Cohen's Conventions

Jacob Cohen published widely used conventions for what counts as a small, medium, or large effect for various statistical tests. For comparing means, Cohen's d is the standard effect size measure:

Small: d ≈ 0.2
Medium: d ≈ 0.5
Large: d ≈ 0.8

Cohen's d is computed as the difference in means divided by the pooled standard deviation. These should be treated as rough guidelines — a "small" effect in a high-stakes context can still be enormously important.

⚠

Effect Size ≠ Importance

A 0.01% improvement in CTR might be a "tiny" effect by Cohen's standards, but on a platform with 100 million users it could mean millions of dollars. Always interpret effect size in context — the statistical label "small" is not the same as "doesn't matter."

Checkpoint

You're running a power analysis for an A/B test on a checkout flow change. You have no prior data. Which approach is most appropriate for estimating effect size?

←PreviousStatistical PowerStatistical Significance and Power Analysis Next→Power Analysis in PythonStatistical Significance and Power Analysis