Independent Samples and Welch's t-Test

Student's Independent Samples t-Test

Question: Are the means of two independent groups significantly different?

Use it when: Two separate groups, no overlap between subjects, means compared. Common in A/B tests.

Assumptions: Independence, normality within each group, equal variances (homogeneity).

In Python: scipy.stats.ttest_ind(group1, group2)

Welch's t-Test (Unequal Variances)

Question: Same as Student's — but used when the equal-variance assumption may not hold.

Use it when: Two independent groups, but variance in the groups differs or you're unsure. Welch's computes an adjusted degrees of freedom (Welch-Satterthwaite equation) that accounts for unequal variances.

Recommendation: Default to Welch's. The cost of using it when variances are actually equal is small. The cost of using Student's when they're not can be substantial — your p-values will be wrong.

In Python: scipy.stats.ttest_ind(group1, group2, equal_var=False)

Independent Samples t-Test (Welch's)
Choose a scenario
Group means & spread (±1.96 SD)
New customersReturning customers

Dots = means. Horizontal bars = ±1.96 SD range. Dashed line = mean difference.

Unequal variances detected — SD ratio ≈ 3.0×. Welch's t-test (used here) adjusts for this automatically via Welch-Satterthwaite df.

New customers
Mean (x̄₁)65.0
SD (s₁)5.0
Size (n₁)100
Returning customers
Mean (x̄₂)67.0
SD (s₂)15.0
Size (n₂)100
Welch's t-distribution (df = 120.7)
t−t-4-3-2-101234t (df = 120.7)

Shaded tails = p-value region. Orange marker = observed t-statistic.

Results
Mean diff (x̄₁ − x̄₂)
-2.00
Standard error
1.5811
t-statistic
-1.265
p-value (two-tailed)
0.2083
Interpretation (α = 0.05)

p = 0.2083 ≥ 0.05 — fail to reject H₀. No significant difference between New customers (mean = 65) and Returning customers (mean = 67) was detected.

How it's computed
1
SE = √(s₁²/n₁ + s₂²/n₂) = √(5²/100 + 15²/100) = 1.5811
2
t = (x̄₁ − x̄₂) / SE = (6567) / 1.5811 = -1.265
3
Welch df = (s₁²/n₁ + s₂²/n₂)² / [(s₁²/n₁)²/(n₁−1) + (s₂²/n₂)²/(n₂−1)] = 120.7
4
p = 0.2083not significant at α = 0.05
Group A meanGroup B meant-statisticp-value region

Adjust group means, standard deviations, and sample sizes to see how Welch's t-statistic and p-value respond. Notice how the unequal-variance warning appears when SD ratios diverge.

Worked Example: New vs. Returning Customers

Comparing average purchase amounts: new customers (mean = 65,SD=65, SD = 5, n = 100) vs. returning customers (mean = 67,SD=67, SD = 15, n = 100).

Notice the standard deviations: 5vs.5 vs. 15. That's a 3x difference in spread — Levene's test would likely flag this, and Welch's is clearly the right choice here.

Running the independent t-test, you might find the means are not statistically significantly different (because the high variance in returning customers creates a wide confidence interval around their mean). The lesson: visually similar sample means don't necessarily mean a significant difference when variance is high.

Checkpoint

You're comparing average response time between two groups in an A/B test. Levene's test returns p = 0.01, indicating significantly unequal variances. Which test should you use?