Sampling Bias

Sampling Bias Explorer

Certain groups are structurally unreachable — not because they refused, but because the collection method never touched them.

Scenario

An internet-usage survey is distributed only through online platforms. People without reliable internet access are excluded before a single question is asked.

Population (80 people)

in sample (28)excluded (52)

What the numbers say

All adults in the region38% have no home broadbandbroadband gap

Online platform users only3% report no broadbandbroadband gap

⚠

Your sample looks like online users, not the population.

Select a bias type to see how each one distorts what ends up in your data — and what a clean sample should look like instead.

Sampling bias is systematic error in who ends up in your dataset. Four forms appear most often:

Selection bias. Certain groups are systematically excluded — like distributing an internet-usage survey only through online platforms.
Volunteer bias. Motivated self-selectors aren't representative of the broader population.
Response bias. Systematic differences between respondents and non-respondents — people over- or under-reporting income, for example.
Measurement bias. The instrument itself favors certain outcomes — a thermometer that reads high will overestimate fever prevalence.

Mitigations: random sampling (every member of the population has equal probability of inclusion), stratified sampling (proportional sampling from known subgroups), and careful design throughout.

ℹ

Evaluating a Data Source: Five Criteria

Accuracy — How closely does it reflect reality?
Completeness — Is all necessary information present?
Consistency — Does it match other reliable sources?
Relevance — Is it applicable to the problem?
Timeliness — Is it up to date?

Checkpoint

Think of a dataset you've used or would like to use for a project. Apply the five evaluation criteria (accuracy, completeness, consistency, relevance, timeliness) to it. Which criterion is hardest to satisfy, and what would you do about it?

←PreviousIRBSourcing Data Next→Data OrganizationSourcing Data