The Chi-Square Test
The tests so far compare means or distributions of continuous data. The chi-square test is fundamentally different — it's for categorical data. Are two categorical variables associated? Does a categorical variable's distribution match what you'd expect?
Chi-Square Test of Independence
Question: Are two categorical variables associated?
Use it when: You have two categorical variables and want to know if they're related — or independent.
Process:
- Build a contingency table of observed frequencies (rows × columns = one cell per category combination).
- Compute expected frequencies under independence: (row total × column total) / grand total.
- Chi-square statistic:
Examples: Is device type associated with conversion? Is gender associated with product preference? Is neighborhood associated with churn?
In Python: scipy.stats.chi2_contingency(contingency_table)
Start with a contingency table — count how many observations fall into each combination of categories. Row totals and column totals let you compute what you'd expect if the variables were truly independent.
| Device ↓ / Converted → | Yes | No | Total |
|---|---|---|---|
| Mobile | 45 | 155 | 200 |
| Desktop | 98 | 102 | 200 |
| Tablet | 22 | 78 | 100 |
| Total | 165 | 335 | 500 |
Walk through the four steps of the chi-square test of independence. Switch datasets to see how expected counts, cell contributions, and the final p-value change.
Chi-Square Goodness-of-Fit
Question: Does the distribution of a single categorical variable match an expected distribution?
Examples: Are customer arrivals uniformly distributed across days of the week? Are dice rolls actually uniform? Does the demographic distribution of your users match the national distribution?
In Python: scipy.stats.chisquare(observed, expected)
Compare the observed frequencies to the expected frequencies under your hypothesis. If the distribution matches, bars should be roughly equal in height.
| Category | Observed (O) | Expected (E) | O − E |
|---|---|---|---|
| Mon | 42 | 60.0 | -18.0 |
| Tue | 38 | 60.0 | -22.0 |
| Wed | 65 | 60.0 | +5.0 |
| Thu | 55 | 60.0 | -5.0 |
| Fri | 100 | 60.0 | +40.0 |
Total observations: 300
Walk through the goodness-of-fit test step by step. See how observed counts compare to a reference distribution and which categories drive the test statistic.
Assumptions That Get Missed
- Independence of observations.
- Sufficient sample size: Each cell of the contingency table should have at least 5 expected occurrences. Fewer than 20% of cells with expected frequencies below 5.
- If cells have very low expected counts, use Fisher's exact test instead — it doesn't rely on the chi-square approximation and works well with small samples.
- Mutually exclusive categories: Each data point must fit into exactly one category.
You want to test whether users from different countries (US, UK, Germany, France) have different rates of opting into push notifications (Yes/No). You build a 4×2 contingency table. One cell has an expected count of 3. What should you do?