P-Values, Carefully

The p-value is the probability of obtaining test results at least as extreme as your observed results, assuming the null hypothesis is true. It ranges from 0 to 1. Smaller p-values indicate stronger evidence against the null.

That definition is precise and commonly misread. Let's slow down on it.

⚠

Common Misinterpretations of the p-value

Wrong: "p = 0.03 means there's a 3% chance the null is true."
Right: "If the null were true, we'd see results this extreme about 3% of the time." The p-value is about the data given the null — not about the null given the data.
Wrong: "p < 0.05 means the effect is real and important."
Right: Statistical significance and practical significance are different. A tiny effect can have a tiny p-value if you have enough data.
Wrong: "p > 0.05 proves there's no effect."
Right: Failing to reject the null means you lack evidence for an effect — not that there's no effect. The effect might be real but your sample might be too small to detect it.

Common significance thresholds (α): 0.05 is standard in most fields; 0.01 is used when false positives are particularly costly; 0.10 is sometimes used in exploratory work. Choose α before looking at the data.

Use p-values as one piece of evidence for decisions, not as proof. They are tools, not verdicts.

P-Value Visualizer

The shaded area is the p-value, the probability of observing a test statistic at least this extreme under the null hypothesis (standard normal). Drag the slider to see how the p-value changes.

Test statistic (z)1.96

0.004.00

Tail

Significance level (α)

−4±1.96+4

p-valueP(|Z| ≥ 1.96)

0.0500

Statistically significant at α = 0.05 — reject H₀

Reminder: statistical significance does not imply practical importance. Always consider effect size alongside the p-value.

1.96

Test statistic (z)

0.0500

p-value

Reject H₀

Decision at α = 0.05

p-value (shaded area)Test statistic

Drag the test statistic to see the p-value (shaded area) update in real time. Toggle between one- and two-tailed tests and change α to see how the rejection decision changes.

Checkpoint

An A/B test shows that users who saw the new homepage design clicked "sign up" at a rate of 4.21% vs. 4.20% for the old design, with p = 0.003. What is the most appropriate conclusion?

←PreviousType 1 and Type 2 ErrorsThe Logic of Hypothesis Testing Next→P-Hacking and Why α Comes FirstThe Logic of Hypothesis Testing