Non-Probability Sampling
Non-probability sampling produces samples that are biased in ways that are hard to characterize, and statistical claims based on them have weaker external validity. You should always strive for probability sampling and be honest about the limitations when it's not feasible.
- Convenience sampling: Select subjects because they're convenient — nearby, already in your database, responded to your email. The classic research example is recruiting your own students.
- Purposive (judgmental) sampling: Researchers hand-pick subjects they believe are most representative. Selection bias is baked in — your judgment of who is "representative" shapes the conclusions.
- Snowball sampling: Participants recruit other participants. Useful for hard-to-reach populations, but you often end up with a homogeneous chain — friends recruiting friends. Used legitimately in research on marginalized communities; less legitimate in contexts that assume representativeness.
- Quota sampling: Divide the population into subgroups and non-randomly select observations to meet a quota. It looks structured but the within-stratum selection isn't random.
⚠
The Key Bias Types to Know
- Selection bias: Your sampling method systematically over- or under-represents some part of the population.
- Non-response bias: People who respond to your survey differ systematically from those who don't.
- Undercoverage: Your sampling frame doesn't include parts of the population you care about.
- Sampling frame errors: Your sampling frame is wrong — outdated, mismatched, full of duplicates.
💭Reflection
A company trains a content recommendation model on data from its most engaged users, since those are the users with the most behavioral history. What type of bias does this introduce, and how might it affect model behavior for new or less-engaged users?