Wilcoxon Signed-Rank Test
Question: Is there a significant difference in the medians of two related groups?
Use it when: Paired data (like the paired t-test), but normality is violated.
How it works:
- Compute the difference between each paired observation.
- Rank the absolute differences (ignoring sign).
- Reattach the original signs to the ranks.
- The test statistic W is the smaller of the sum of positive ranks vs. negative ranks.
In Python: scipy.stats.wilcoxon(before, after)
Wilcoxon Signed-Rank — Step-by-Step
Dataset
Step 1 — Compute differences
For each pair, compute After − Before. Positive differences mean the score went up; negative means it went down. Pairs where the difference is exactly 0 are excluded from the test.
| Pair | Before (hrs) | After (hrs) | Difference |
|---|---|---|---|
| 1 | 6.1 | 7.4 | +1.3 |
| 2 | 5.5 | 6.8 | +1.3 |
| 3 | 7.2 | 7.5 | +0.3 |
| 4 | 6.8 | 7.9 | +1.1 |
| 5 | 5.0 | 6.1 | +1.1 |
| 6 | 6.5 | 6.4 | -0.1 |
| 7 | 7.0 | 8.2 | +1.2 |
| 8 | 5.8 | 7.0 | +1.2 |
Step 1 of 4
Walk through the four steps of the Wilcoxon signed-rank test on real paired data. Switch datasets to see how the ranks and test statistic change.
Checkpoint
You measure user engagement scores before and after a UI redesign for 50 users. The scores are skewed with some large outliers (a few power users with very high engagement). Which test is most appropriate?