Multiple Regression
Bedrooms alone is a limited model. House price is shaped by location, square footage, proximity to good schools, whether there's a pool, and dozens of other factors. We need more than one predictor.
Multiple linear regression extends the single-predictor model to as many features as you have. For our house-price problem, suppose we add location (distance from city center in miles) and whether the house has a pool:
Price is a weighted sum of predictors plus a baseline. OLS still minimizes SSE — it now finds the hyperplane (rather than a line) that does so. Python handles this identically: just pass a multi-column feature matrix.
A Fitted Multiple Regression
Suppose OLS returns:
Reading each coefficient:
- : Holding distance and pool constant, each additional bedroom adds $60,000 to predicted price.
- : Holding bedrooms and pool constant, each additional mile from city center reduces predicted price by $8,000.
- : Holding bedrooms and distance constant, having a pool adds $45,000 to predicted price.
Prediction: A 3-bedroom house, 5 miles from center, with a pool:
The Critical Clause: Holding All Else Constant
Every coefficient in a multiple regression carries a hidden clause: holding all other predictors constant.
Example: the coefficient on bedrooms was 65,000 in the simple model (Section 3) and 60,000 in the multiple model above. The difference is because the multiple model has already "absorbed" some of the bedroom effect via distance and pool. Coefficients shift when you add or remove predictors.
A dramatic version of this is sign reversal. In a dataset of houses, more bedrooms tends to mean higher price — a positive raw correlation. But once you control for square footage, an extra bedroom sometimes means smaller rooms, which can flip the coefficient negative. Interpreting a coefficient in isolation, without knowing what else is in the model, is unreliable.
The same model serves all three purposes. The question you're asking determines what you look at in the output.
You build a multiple regression predicting house price from: bedrooms, square footage, and distance from city center. The coefficient on bedrooms is −$12,000. What is the most likely explanation?