Sampling Bias
Certain groups are structurally unreachable — not because they refused, but because the collection method never touched them.
Scenario
An internet-usage survey is distributed only through online platforms. People without reliable internet access are excluded before a single question is asked.
Population (80 people)
What the numbers say
Your sample looks like online users, not the population.
Select a bias type to see how each one distorts what ends up in your data — and what a clean sample should look like instead.
Sampling bias is systematic error in who ends up in your dataset. Four forms appear most often:
- Selection bias. Certain groups are systematically excluded — like distributing an internet-usage survey only through online platforms.
- Volunteer bias. Motivated self-selectors aren't representative of the broader population.
- Response bias. Systematic differences between respondents and non-respondents — people over- or under-reporting income, for example.
- Measurement bias. The instrument itself favors certain outcomes — a thermometer that reads high will overestimate fever prevalence.
Mitigations: random sampling (every member of the population has equal probability of inclusion), stratified sampling (proportional sampling from known subgroups), and careful design throughout.
Evaluating a Data Source: Five Criteria
- Accuracy — How closely does it reflect reality?
- Completeness — Is all necessary information present?
- Consistency — Does it match other reliable sources?
- Relevance — Is it applicable to the problem?
- Timeliness — Is it up to date?
Think of a dataset you've used or would like to use for a project. Apply the five evaluation criteria (accuracy, completeness, consistency, relevance, timeliness) to it. Which criterion is hardest to satisfy, and what would you do about it?