Distributions Across the ML Stack
Rather than memorizing every formula, you should focus on recognizing which distribution describes your situation. The skill that pays off: seeing a problem and knowing what shape the uncertainty should take.
◆
Where Each Distribution Lives in Practice
- Bernoulli / Binomial: Underlie logistic regression and any binary classification problem. Click-through rates, conversion rates, fraud detection, disease diagnosis.
- Poisson: Count-based features and outcomes. Call volumes, request counts, defect counts, rare event modeling. Also the foundation for Poisson regression.
- Normal: The assumption behind most parametric statistical tests. Residuals of linear regression (when assumptions hold). Initialization weights in neural networks.
- Exponential: Survival analysis, churn prediction, reliability modeling, time-to-failure. The foundation for Cox proportional hazards models.
- Uniform: Random weight initialization, A/B test group assignment, Monte Carlo simulation, dropout masks.
- t-distribution: Hypothesis tests about means when population variance is unknown. Confidence intervals on regression coefficients.
A useful heuristic for choosing:
- Is the outcome binary? → Bernoulli (single event) or Binomial (count of successes).
- Is the outcome a count per time period? → Poisson.
- Is it time to an event? → Exponential (or Weibull for more flexibility).
- Is it a continuous measurement with no known structure? → Normal (especially for residuals and errors).
- Do you genuinely have no reason to prefer any value? → Uniform.
💭Reflection
Identify the most appropriate distribution for the outcome variable and explain why: predicting the number of support tickets a customer will submit next month
💭Reflection
Identify the most appropriate distribution for the outcome variable and explain why: predicting whether an email is spam
💭Reflection
Identify the most appropriate distribution for the outcome variable and explain why: predicting how long a customer will remain subscribed before canceling.