Concept

Sampling Bias

Definition

Sampling bias is a systematic, non-random error introduced when some members of the target population have a higher or lower probability of being included in the sample than they should. The result is a sample whose characteristics differ from the population in a predictable direction — and any estimate built from that sample inherits the same tilt. Crucially, sampling bias does not vanish as the sample grows; collecting more biased data merely produces a more confidently wrong answer.

The phenomenon is one of the most common reasons that careful-looking analyses produce conclusions that fail to replicate or fall apart when applied to a different population. It is also one of the easiest errors to commit and one of the hardest to diagnose from the data alone.

Why it matters

How it works

The classic example is the 1936 Literary Digest presidential poll, which surveyed millions of subscribers, car owners, and telephone users — and confidently predicted that Alf Landon would defeat Franklin Roosevelt in a landslide. Roosevelt won 46 of 48 states. The sample was enormous but drawn from a frame that systematically over-represented wealthier Americans, who in 1936 differed sharply from the broader electorate in their voting preferences. The bias was structural, baked into who could be reached at all, and no amount of additional respondents from the same frame would have caught the error. A much smaller probability sample collected by George Gallup that year did call the result correctly.

Survivorship bias works similarly. Studying the habits of successful companies, durable buildings, or long-lived investors tells you about the survivors but says nothing about the population of attempts — many of which had identical habits and failed. Non-response bias arises when the people who decline to answer differ systematically from those who reply; a survey of patient satisfaction that only the happy customers return distorts the rating upward. The common pattern is that the bias lives in the gap between the population we want to know about and the population we actually observe, and closing that gap requires thinking about the sampling mechanism, not just the numbers it produced.

Where it goes next

Random Sampleshares tag: bias
Samplingshares tag: sampling
Measurement Errorshares tag: bias
Misleading Statisticsshares tag: bias
Pollingshares tag: sampling
Regression to the Meanshares tag: bias
Sample Sizeshares tag: sampling
Sampling Distributionshares tag: sampling
80/20 Ruleshares tag: statistics
Bar Chartshares tag: statistics
Base Rateshares tag: statistics
Blind Spotsshares tag: bias
Causationshares tag: statistics
Central Tendencyshares tag: statistics
Clinical Trialshares tag: statistics
Conditional Value-at-Riskshares tag: statistics
Confidence Intervalshares tag: statistics
Correlationshares tag: statistics
Correlation Coefficientshares tag: statistics
Correlation vs Causationshares tag: statistics
Cost-Effectivenessshares tag: statistics
Data Literacyshares tag: statistics
Decision Under Uncertaintyshares tag: statistics
Descriptive Statisticsshares tag: statistics
Discrete Datashares tag: statistics
Distant Threatsshares tag: bias
Distribution (Market Phase)shares tag: statistics
Distributionsshares tag: statistics
Dollar Streetshares tag: statistics
Doubling Lineshares tag: statistics
Ego Defenseshares tag: bias
Epidemiologyshares tag: statistics
Experimental Designshares tag: statistics
Failure Rateshares tag: statistics
Frequentist Probabilityshares tag: statistics
Frightening vs Dangerousshares tag: statistics
Generational Myopiashares tag: bias
Histogramshares tag: statistics
Hypothesis Testingshares tag: statistics
Income Levelsshares tag: statistics
Information Coefficientshares tag: statistics
Least Squaresshares tag: statistics
Level vs Directionshares tag: statistics
Linear Regressionshares tag: statistics
Lonely Numbershares tag: statistics
Majority Trapshares tag: statistics
Meanshares tag: statistics
Mean Reversionshares tag: statistics
Medianshares tag: statistics
Mutually Exclusiveshares tag: statistics
Null Hypothesisshares tag: statistics
Overfittingshares tag: statistics
P-Valueshares tag: statistics
Peak Childshares tag: statistics
Per Capita Ratioshares tag: statistics
Percentageshares tag: statistics
Performance Rankshares tag: statistics
Pie Chartshares tag: statistics
Placebo Effectshares tag: statistics
Population Projectionshares tag: statistics
Precision vs. Accuracyshares tag: statistics
Prejudiceshares tag: bias
Principal Component Analysisshares tag: statistics
Probabilityshares tag: statistics
Questionnaire Designshares tag: statistics
Randomisationshares tag: statistics
Rank Correlationshares tag: statistics
Rationalityshares tag: bias
Returnsshares tag: statistics
Risk Calculationshares tag: statistics
Rolling Metricsshares tag: statistics
S-Curveshares tag: statistics
Significance Levelshares tag: statistics
Simpson's Paradoxshares tag: statistics
Size Instinctshares tag: statistics
Slow Changeshares tag: statistics
Small Stepsshares tag: statistics
Spurious Correlationshares tag: statistics
Standard Deviationshares tag: statistics
Statistical Inferenceshares tag: statistics
Statistical Significanceshares tag: statistics
Straight Line Instinctshares tag: statistics
Taphonomyshares tag: bias
Time-Series Datashares tag: statistics
Z-Scoreshares tag: statistics

Sampling Bias

Definition

Why it matters

How it works

Where it goes next

Continue exploring

Tags

Sampling Bias

Definition

Why it matters

How it works

Where it goes next

Related concepts

Continue exploring

Tags