Independent Samples and Welch's t-Test
Student's Independent Samples t-Test
Question: Are the means of two independent groups significantly different?
Use it when: Two separate groups, no overlap between subjects, means compared. Common in A/B tests.
Assumptions: Independence, normality within each group, equal variances (homogeneity).
In Python: scipy.stats.ttest_ind(group1, group2)
Welch's t-Test (Unequal Variances)
Question: Same as Student's — but used when the equal-variance assumption may not hold.
Use it when: Two independent groups, but variance in the groups differs or you're unsure. Welch's computes an adjusted degrees of freedom (Welch-Satterthwaite equation) that accounts for unequal variances.
Recommendation: Default to Welch's. The cost of using it when variances are actually equal is small. The cost of using Student's when they're not can be substantial — your p-values will be wrong.
In Python: scipy.stats.ttest_ind(group1, group2, equal_var=False)
Dots = means. Horizontal bars = ±1.96 SD range. Dashed line = mean difference.
Unequal variances detected — SD ratio ≈ 3.0×. Welch's t-test (used here) adjusts for this automatically via Welch-Satterthwaite df.
Shaded tails = p-value region. Orange marker = observed t-statistic.
p = 0.2083 ≥ 0.05 — fail to reject H₀. No significant difference between New customers (mean = 65) and Returning customers (mean = 67) was detected.
Adjust group means, standard deviations, and sample sizes to see how Welch's t-statistic and p-value respond. Notice how the unequal-variance warning appears when SD ratios diverge.
Worked Example: New vs. Returning Customers
Comparing average purchase amounts: new customers (mean = 5, n = 100) vs. returning customers (mean = 15, n = 100).
Notice the standard deviations: 15. That's a 3x difference in spread — Levene's test would likely flag this, and Welch's is clearly the right choice here.
Running the independent t-test, you might find the means are not statistically significantly different (because the high variance in returning customers creates a wide confidence interval around their mean). The lesson: visually similar sample means don't necessarily mean a significant difference when variance is high.
You're comparing average response time between two groups in an A/B test. Levene's test returns p = 0.01, indicating significantly unequal variances. Which test should you use?