Definition
Linear regression is a statistical procedure that estimates the straight-line relationship between a response variable and one or more predictor variables. It models the response as a weighted sum of the predictors plus an intercept, then chooses the weights and intercept that best fit the observed data. With a single predictor it produces a line through a scatterplot; with several it produces a plane or a higher-dimensional hyperplane.
Despite its simplicity, linear regression is the most widely used statistical model. It is the standard tool for quantifying how one variable depends on others, for forecasting, and for adjusting comparisons to control for confounding factors.
Why it matters
How it works
The model assumes the response equals a linear combination of the predictors plus an unobserved error term, then estimates the coefficients by least squares — picking the values that minimize the sum of squared differences between observed and predicted responses. For simple linear regression with one predictor, the slope is computed as the covariance of the two variables divided by the variance of the predictor, and the line passes through the point at their joint means. The result is a line that captures the dominant trend while accepting that individual points scatter around it.
Once fitted, the regression produces several outputs in addition to the line itself. The coefficient of determination (R-squared) reports the proportion of variance in the response explained by the model — a value near one means the line tracks the data tightly, near zero means it barely improves on a flat horizontal line. Each coefficient comes with a standard error and a hypothesis test against the null that the variable has no effect. Residuals — the leftover deviations from the line — are inspected for patterns that would signal a violation of the model's assumptions, such as nonlinearity or non-constant variance.