Definition
Correlation versus causation is the discipline of distinguishing two variables that merely move together from two variables in which one actually drives the other. Correlation is a statistical pattern — when one variable changes, the other tends to change in a predictable way. Causation is a mechanistic claim — change the first variable and the second responds because of that change.
The distinction matters because policy, medicine, and personal decisions usually demand causal knowledge, while the data most readily available supports only correlational claims. Almost every famous statistical scandal — the spurious link between ice-cream sales and drownings, the once-confident reports linking hormone replacement therapy to reduced heart disease, countless economic forecasts — traces back to treating a correlation as if it were a causal relationship.
Why it matters
How it works
When two variables A and B move together, there are at least four explanations to consider. A may cause B; B may cause A; both may be driven by a third variable, C, that has not been measured; or the apparent pattern may be the product of chance in a small sample. Establishing that A causes B requires evidence that rules out the other three alternatives, not just evidence of co-movement.
Reverse causation often masquerades as causation. A study showing that successful companies have flat organisational structures cannot tell us whether flatness produced success, success allowed flatness, or both reflect a third trait such as adaptive leadership. Confounding is even more common. Ice-cream sales correlate with drowning rates because hot weather causes both — neither variable drives the other. Chance is the residual category: with enough variables and enough comparisons, some correlations will appear by sampling noise alone, which is why multiple-comparison corrections and replication matter. The serious work of statistics is not finding correlations but earning the right to draw a causal arrow between them.