Concept

Spurious Correlation

Definition

A spurious correlation is a statistical relationship between two variables that exists in the data but does not reflect any direct causal link between them. The numbers move together — sometimes with a high correlation coefficient and a tiny p-value — but neither one is producing the other. The apparent association is an artefact of a hidden third factor, of how the data was selected, or simply of chance over a small number of comparisons.

Spurious correlations are the single most common reason that headline statistical findings turn out to be misleading. The mathematics behind correlation is silent about cause; only an outside argument — a controlled experiment, a domain mechanism, a careful design — can promote an association into a causal claim.

Why it matters

How it works

The classic textbook example is the strong positive correlation between ice cream sales and drowning deaths. Higher ice cream sales in a given month do not cause drowning, nor does drowning cause ice cream consumption. Both variables are driven by a third — hot weather — which independently increases swimming activity and ice cream purchases. The shared cause produces the association even though the two outcomes are causally independent. Statisticians call this third variable a confounder, and the only general defence against it is to identify and condition on it, or to randomise the input of interest so that confounders cannot align systematically with it.

A second mechanism is plain chance over many comparisons. If you compute correlations between every pair of variables in a dataset of a hundred columns, you will perform almost five thousand tests. At a significance level of 0.05, roughly two hundred and fifty of those will reach significance by accident even if every pair is truly independent. The economist Tyler Vigen has assembled hundreds of these — divorce rates in Maine vs. per-capita margarine consumption, US spending on science vs. suicides by hanging — to make the point. The numbers are real; the relationships are not. A third mechanism is selection: if the sample only includes individuals who scored highly on at least one of two unrelated traits, the two traits will appear negatively correlated within the sample even though they are independent in the population.

Where it goes next

Correlation vs Causationshares tag: causation
Causationshares tag: causation
Correlation Coefficientshares tag: correlation
Experimental Designshares tag: causation
Rank Correlationshares tag: correlation
Simpson's Paradoxshares tag: causation
80/20 Ruleshares tag: statistics
Attributionshares tag: causation
Bar Chartshares tag: statistics
Base Rateshares tag: statistics
Central Tendencyshares tag: statistics
Clinical Trialshares tag: statistics
Conditional Value-at-Riskshares tag: statistics
Confidence Intervalshares tag: statistics
Correlationshares tag: statistics
Cost-Effectivenessshares tag: statistics
Data Literacyshares tag: statistics
Decision Under Uncertaintyshares tag: statistics
Descriptive Statisticsshares tag: statistics
Discrete Datashares tag: statistics
Distribution (Market Phase)shares tag: statistics
Distributionsshares tag: statistics
Dollar Streetshares tag: statistics
Doubling Lineshares tag: statistics
Epidemiologyshares tag: statistics
Failure Rateshares tag: statistics
Frequentist Probabilityshares tag: statistics
Frightening vs Dangerousshares tag: statistics
Great-Man Theoryshares tag: causation
Histogramshares tag: statistics
Hypothesis Testingshares tag: statistics
Income Levelsshares tag: statistics
Information Coefficientshares tag: statistics
Least Squaresshares tag: statistics
Level vs Directionshares tag: statistics
Linear Regressionshares tag: statistics
Lonely Numbershares tag: statistics
Majority Trapshares tag: statistics
Meanshares tag: statistics
Mean Reversionshares tag: statistics
Measurement Errorshares tag: statistics
Medianshares tag: statistics
Misleading Statisticsshares tag: statistics
Mutually Exclusiveshares tag: statistics
Necessity and Sufficiencyshares tag: causation
Null Hypothesisshares tag: statistics
Overfittingshares tag: statistics
P-Valueshares tag: statistics
Peak Childshares tag: statistics
Per Capita Ratioshares tag: statistics
Percentageshares tag: statistics
Performance Rankshares tag: statistics
Pie Chartshares tag: statistics
Placebo Effectshares tag: statistics
Pollingshares tag: statistics
Population Projectionshares tag: statistics
Precision vs. Accuracyshares tag: statistics
Principal Component Analysisshares tag: statistics
Probabilityshares tag: statistics
Questionnaire Designshares tag: statistics
Random Sampleshares tag: statistics
Randomisationshares tag: statistics
Regression to the Meanshares tag: statistics
Returnsshares tag: statistics
Risk Calculationshares tag: statistics
Rolling Metricsshares tag: statistics
S-Curveshares tag: statistics
Sample Sizeshares tag: statistics
Samplingshares tag: statistics
Sampling Biasshares tag: statistics
Sampling Distributionshares tag: statistics
Significance Levelshares tag: statistics
Size Instinctshares tag: statistics
Slow Changeshares tag: statistics
Small Stepsshares tag: statistics
Standard Deviationshares tag: statistics
Statistical Inferenceshares tag: statistics
Statistical Significanceshares tag: statistics
Straight Line Instinctshares tag: statistics
Time-Series Datashares tag: statistics
Z-Scoreshares tag: statistics

Spurious Correlation

Definition

Why it matters

How it works

Where it goes next

Continue exploring

Tags

Spurious Correlation

Definition

Why it matters

How it works

Where it goes next

Related concepts

Continue exploring

Tags