The Test Decision Tree

When I was a graduate student, I kept an index card on my desk with a decision tree for picking the right two-group statistical test. Here is the grown-up version:

Two-Group Test Decision Tree
PairedIndependentNormalNot normalNormalNot normalEqualNot equal
Compare two groups
Paired data
same unit measured twice
Independent data
different subjects in each group
Differences normal?
Shapiro-Wilk on paired diffs
Both groups normal?
Q-Q plot or Shapiro-Wilk
Paired t-test
ttest_rel(before, after)
Wilcoxon signed-rank
wilcoxon(before, after)
Equal variances?
Levene's test
Mann-Whitney U
mannwhitneyu(a, b)
Student's t-test
ttest_ind(a, b, equal_var=True)
Welch's t-test
ttest_ind(a, b, equal_var=False)
Paired branchIndependent branchNonparametricParametricHover a node to highlight its subtree

The two-group comparison decision tree. Hover any node to highlight its subtree and trace the path to a test.

For Categorical Data

The above tree handles continuous outcomes. For categorical outcomes:

  • Two categorical variables: Chi-square test of independence (or Fisher's exact if cells are small).
  • One categorical variable vs. expected distribution: Chi-square goodness-of-fit.

For Three or More Groups

The decision tree above covers two-group comparisons. For three or more groups, you need ANOVA — which is the next chapter.

Statistical Test Decision Tree
Question 1

Are you comparing groups (or testing a relationship)?

Group comparison = t-tests, ANOVA, Mann-Whitney. Relationship = correlation, regression.

Answer the questions above to find the right test for your data.

Interactive decision tree — answer questions about your data (paired? normal? equal variance?) and arrive at the recommended test with a brief explanation.

💭Reflection

For each scenario, identify the correct test and explain why: (1) Comparing click-through rates (binary outcome) for two ad creatives shown to different users. (2) Comparing model accuracy scores across 10 benchmark datasets for two models. (3) Testing whether the distribution of error types produced by a model (Type A, Type B, Type C) matches the expected distribution.