Interpreting Regression Coefficients
Now that we have the OLS line for bedrooms vs. house price, let's see what it actually means:
This is the same y = mx + b you know from algebra!
- 65,000 is the slope (): each additional bedroom is associated with a $65,000 increase in predicted price.
- 100,000 is the intercept (): the predicted price of a house with zero bedrooms.
The slope tells you how much bedrooms matter; the intercept is the baseline.
Reading the Equation
How much does a 3-bedroom house cost, according to this model?
What about a 5-bedroom house?
The difference — 130,000 for two extra bedrooms — is exactly . The slope is constant: the model says every bedroom adds the same amount regardless of whether you're going from 1→2 or 4→5.
When the Intercept Doesn't Make Sense
The intercept here is $100,000 — the predicted price of a house with zero bedrooms. Does that make sense? Maybe a studio apartment or a parking space. But if your dataset contains only 2–6 bedroom houses, then zero bedrooms is an extrapolation far outside your data. The intercept is a mathematical anchor for the line, not always a meaningful real-world quantity.
Be especially careful when the intercept is negative. A model predicting sale time from listing price might give a negative time for very cheap homes — which is physically impossible. The intercept anchors the line; it isn't always interpretable.
Orange dot = probe point. Dashed lines show the predicted value.
For each additional 1 bd of bedrooms, predicted house price increases by 65.0 $k.
When bedrooms = 0, predicted house price = 100 $k. Predicted price at 0 bedrooms (a studio or anchor point — not always interpretable).
ŷ = 65.0 × 3.5 + 100 = 328 $k
Use the probe to predict prices at specific bedroom counts. Notice: what does the intercept tell you about the predicted price at zero bedrooms?
Correlation ≠ Causation
The slope of 65,000 says bedrooms and price are associated — not that buying more bedrooms causes a house to be worth more. More bedrooms also means more square footage, better neighborhoods, and other correlated features. Simple linear regression can't separate those effects.
The regression line describes the data you have. Claims about causation require study design, natural experiments, or explicit controls for confounders.
A model gives: price = 60,000 × bedrooms + 80,000. A house has 4 bedrooms and sold for $310,000. What is the residual?