Concept

Overfitting

Definition

Overfitting is the failure mode in which a statistical or machine-learning model captures the idiosyncratic noise of its training data instead of the underlying signal — and so performs brilliantly on data it has already seen and poorly on anything new. The classic diagnostic is a widening gap between training-set performance and validation-set performance as model complexity grows. A linear regression with three features may have a modest in-sample R-squared and a similar out-of-sample R-squared; the same regression with three hundred features may explain 95% of the training variance and 0% of the test variance.

The mechanism is straightforward: every additional parameter gives the model another degree of freedom to bend itself around training points. Enough degrees of freedom and any finite dataset can be fit exactly. What is being fit, at that point, is not the systematic relationship but the unrepeatable noise.

Why it matters

How it works

Overfitting arises whenever a model has more capacity than the data can support. Capacity comes from many sources: the number of free parameters, the flexibility of the functional form, the depth of a decision tree, the number of features included, the number of times the modeller iterates on the same training set. Each source increases the model's ability to express any pattern — including the noise pattern that happens to be in this particular sample.

The defences fall into three families. Statistical defences add penalties for complexity: L1 and L2 regularisation, information criteria, early stopping on a validation loss. Procedural defences split the data: train on one slice, validate on another, hold a third in reserve that nobody is permitted to look at until a single final evaluation. Conceptual defences impose structural priors: rather than letting the model find any pattern, restrict it to patterns that make economic or physical sense. The most important defence is the hardest to systematise — limit how many distinct hypotheses are tested against the same dataset, because the probability of one of them passing by chance grows with the number of tries.

Where it goes next

Majority Trapshares tag: generalization
80/20 Ruleshares tag: statistics
Bar Chartshares tag: statistics
Base Rateshares tag: statistics
Category Errorshares tag: generalization
Causationshares tag: statistics
Central Tendencyshares tag: statistics
Clinical Trialshares tag: statistics
Conditional Value-at-Riskshares tag: statistics
Confidence Intervalshares tag: statistics
Correlationshares tag: statistics
Correlation Coefficientshares tag: statistics
Correlation vs Causationshares tag: statistics
Cost-Effectivenessshares tag: statistics
Data Literacyshares tag: statistics
Decision Under Uncertaintyshares tag: statistics
Descriptive Statisticsshares tag: statistics
Discrete Datashares tag: statistics
Distribution (Market Phase)shares tag: statistics
Distributionsshares tag: statistics
Dollar Streetshares tag: statistics
Doubling Lineshares tag: statistics
Epidemiologyshares tag: statistics
Experimental Designshares tag: statistics
Failure Rateshares tag: statistics
Frequentist Probabilityshares tag: statistics
Frightening vs Dangerousshares tag: statistics
Histogramshares tag: statistics
Hypothesis Testingshares tag: statistics
Income Levelsshares tag: statistics
Information Coefficientshares tag: statistics
Least Squaresshares tag: statistics
Level vs Directionshares tag: statistics
Linear Regressionshares tag: statistics
Lonely Numbershares tag: statistics
Machine Learningshares tag: machine-learning
Meanshares tag: statistics
Mean Reversionshares tag: statistics
Measurement Errorshares tag: statistics
Medianshares tag: statistics
Misleading Statisticsshares tag: statistics
Mutually Exclusiveshares tag: statistics
Neural Networkshares tag: machine-learning
Null Hypothesisshares tag: statistics
P-Valueshares tag: statistics
Peak Childshares tag: statistics
Per Capita Ratioshares tag: statistics
Percentageshares tag: statistics
Performance Rankshares tag: statistics
Pie Chartshares tag: statistics
Placebo Effectshares tag: statistics
Pollingshares tag: statistics
Population Projectionshares tag: statistics
Precision vs. Accuracyshares tag: statistics
Principal Component Analysisshares tag: statistics
Probabilityshares tag: statistics
Questionnaire Designshares tag: statistics
Random Sampleshares tag: statistics
Randomisationshares tag: statistics
Rank Correlationshares tag: statistics
Regression to the Meanshares tag: statistics
Returnsshares tag: statistics
Risk Calculationshares tag: statistics
Rolling Metricsshares tag: statistics
S-Curveshares tag: statistics
Sample Sizeshares tag: statistics
Samplingshares tag: statistics
Sampling Biasshares tag: statistics
Sampling Distributionshares tag: statistics
Significance Levelshares tag: statistics
Simpson's Paradoxshares tag: statistics
Size Instinctshares tag: statistics
Slow Changeshares tag: statistics
Small Stepsshares tag: statistics
Spurious Correlationshares tag: statistics
Standard Deviationshares tag: statistics
Statistical Inferenceshares tag: statistics
Statistical Significanceshares tag: statistics
Stereotypeshares tag: generalization
Straight Line Instinctshares tag: statistics
Symbolic AIshares tag: machine-learning
Time-Series Datashares tag: statistics
Z-Scoreshares tag: statistics

Overfitting

Definition

Why it matters

How it works

Where it goes next

Continue exploring

Tags

Overfitting

Definition

Why it matters

How it works

Where it goes next

Related concepts

Continue exploring

Tags