The Chi-Square Test

The tests so far compare means or distributions of continuous data. The chi-square test is fundamentally different — it's for categorical data. Are two categorical variables associated? Does a categorical variable's distribution match what you'd expect?

Chi-Square Test of Independence

Question: Are two categorical variables associated?

Use it when: You have two categorical variables and want to know if they're related — or independent.

Process:

  1. Build a contingency table of observed frequencies (rows × columns = one cell per category combination).
  2. Compute expected frequencies under independence: (row total × column total) / grand total.
  3. Chi-square statistic: χ2=(OE)2E\chi^2 = \sum \frac{(O - E)^2}{E}

Examples: Is device type associated with conversion? Is gender associated with product preference? Is neighborhood associated with churn?

In Python: scipy.stats.chi2_contingency(contingency_table)

Chi-Square Test of Independence — Step-by-Step
Dataset
Step 1 — Build the observed contingency table

Start with a contingency table — count how many observations fall into each combination of categories. Row totals and column totals let you compute what you'd expect if the variables were truly independent.

Device ↓ / ConvertedYesNoTotal
Mobile45155200
Desktop98102200
Tablet2278100
Total165335500
Step 1 of 4

Walk through the four steps of the chi-square test of independence. Switch datasets to see how expected counts, cell contributions, and the final p-value change.

Chi-Square Goodness-of-Fit

Question: Does the distribution of a single categorical variable match an expected distribution?

Examples: Are customer arrivals uniformly distributed across days of the week? Are dice rolls actually uniform? Does the demographic distribution of your users match the national distribution?

In Python: scipy.stats.chisquare(observed, expected)

Chi-Square Goodness-of-Fit — Step-by-Step
Dataset
Step 1 — Compare observed counts to expected counts

Compare the observed frequencies to the expected frequencies under your hypothesis. If the distribution matches, bars should be roughly equal in height.

Mon
Tue
Wed
Thu
Fri
Observed Expected (Uniform (60/day))
CategoryObserved (O)Expected (E)O − E
Mon4260.0-18.0
Tue3860.0-22.0
Wed6560.0+5.0
Thu5560.0-5.0
Fri10060.0+40.0

Total observations: 300

Step 1 of 3

Walk through the goodness-of-fit test step by step. See how observed counts compare to a reference distribution and which categories drive the test statistic.

Assumptions That Get Missed

  • Independence of observations.
  • Sufficient sample size: Each cell of the contingency table should have at least 5 expected occurrences. Fewer than 20% of cells with expected frequencies below 5.
  • If cells have very low expected counts, use Fisher's exact test instead — it doesn't rely on the chi-square approximation and works well with small samples.
  • Mutually exclusive categories: Each data point must fit into exactly one category.
Checkpoint

You want to test whether users from different countries (US, UK, Germany, France) have different rates of opting into push notifications (Yes/No). You build a 4×2 contingency table. One cell has an expected count of 3. What should you do?