Concept

Least Squares

Definition

Least squares is a rule for choosing the best-fitting line, curve, or model through a set of data points. For each observation, the model's predicted value differs from the actual value by a residual; least squares picks the parameters that make the sum of the squared residuals as small as possible. Squaring serves two purposes: it prevents positive and negative deviations from cancelling, and it penalizes large errors disproportionately, so a single far-off point pulls the fit harder than several small misses combined.

Developed by Legendre and Gauss in the early nineteenth century to reconcile astronomical observations, least squares is now the default fitting method behind linear regression and many of its extensions.

Why it matters

How it works

For a simple linear model that predicts a response from a single predictor, the goal is to pick values for the intercept and slope so that the sum of squared vertical distances from each data point to the line is minimized. Calculus gives a tidy result: the optimal slope equals the covariance of the predictor and response divided by the variance of the predictor, and the optimal intercept makes the line pass through the point at the mean of both variables. No iterative search is needed; the formulas drop out directly.

The same principle generalizes. With many predictors, the closed-form solution uses matrix algebra (the normal equations) to give the coefficient vector in one step. The geometric interpretation is that the fitted values are the projection of the response vector onto the column space of the predictor matrix — the closest reachable point under the squared-distance metric. The squared-distance criterion is so analytically convenient that statisticians lived with its sensitivity to outliers for centuries before robust alternatives became practical.

Where it goes next

Continue exploring

Tags