Residuals and Ordinary Least Squares
Imagine plotting house prices against the number of cats in the neighborhood. You'd see something like the chart in the workbook: dots scattered with no discernible pattern. Cats tell you nothing about prices.
Now draw any line through that scatter. For each house, the residual is the vertical distance between the actual price (the dot) and the line's prediction. Points above the line have positive residuals — the model underpredicted. Points below have negative — the model overpredicted. A residual of zero means the prediction was exactly right.
Residuals measure prediction error. The line is your model; the residuals are everywhere it's wrong.
Here's the problem with simply summing residuals: for any line that passes through the mean of the data (which all OLS lines do), the positive and negative residuals cancel perfectly. The sum is always zero — for a terrible line and an excellent one alike. Zero tells you nothing.
To solve this, let's square each residual before summing. Squaring does two things. First, it makes everything positive — no more cancellation. Second, it penalizes large errors more than small ones: a residual of 100k contributes 10 billion to the sum, while a residual of 10k contributes only 100 million. The line "cares" more about the houses it gets badly wrong.
Ordinary Least Squares
The sum of squared errors (SSE) — or sum of squared residuals — is our measure of total model error:
where is the actual price of house and is the line's prediction. Ordinary least squares (OLS) finds the line that minimizes SSE. Calculus gives a closed-form solution; no iteration required. In Python, sklearn.linear_model.LinearRegression and statsmodels.api.OLS both solve it directly.
No real relationship. The OLS line is nearly flat — cats don't predict prices.
Drag anywhere to rotate and shift the line. Amber bars = residuals. Dashed gray = OLS solution.
Drag the line and watch the residuals (vertical bars) and SSE update in real time. Try: start with cats as the predictor — can you beat the OLS line? Then switch to bedrooms and see how much tighter the OLS solution is.
Notice what happened when you switched from cats to bedrooms: the OLS line's SSE dropped substantially. The residuals are smaller and more symmetric. This is a better-fitting model — not because we tried harder, but because bedrooms actually carry information about price. Cats don't.
That's the intuition behind model quality: a predictor is useful if it reduces SSE compared to knowing nothing (a flat line at the mean). If it doesn't, adding it to your model is noise.
You draw two lines through the same house-price scatter plot. Line A has a raw sum of residuals of 0. Line B has a sum of squared residuals of 0. Which line fits the data better?