Get startedGet started for free

Predicting house price using year & size

1. Predicting house price using year & size

Let's now use your first multiple regression model to make predictions! Just as you did for basic regression, let's first visually illustrate what it means to make predictions and then make explicit numerical ones.

2. Refresher: regression plane

Let's consider only the regression plane from the last video to make predictions visually. Say a house is for sale in Seattle and all you know is that its log10-size size is 3.07, and that it was built in 1980. What is a good prediction of its log10-price?

3. Regression plane for prediction

This combination of year = 1980 and log10-size = 3.07, marked with the intersecting dashed lines, corresponds to a fitted value y-hat of 5.45 on the regression plane. This value is marked with a red dot. Let's now do the same, but numerically.

4. Predicted value

Let's regenerate the corresponding regression table by first fitting the model and then using the get_regression_table() function. Recall that in multiple regression, all fitted slope coefficients are interpreted "taking into account all other variables". So for example, taking into account the size of the house, every additional year in recency of the house's construction, there is an associated decrease of on average -0.00138 in log10-price, suggesting a negative relationship.

5. Predicted value

Let's now use these values to numerically make the prediction you visually made earlier. You plug in log10-size = 3.07 and year = 1980 into the fitted equation. This yields a fitted value for log10-price of 5.45. You undo the log10-transformation by raising 10 to the power of this value. Yielding a predicted price for this house of about $282K.

6. Computing all predicted values and residuals

Just as you did for your predictive modeling examples from the last chapter, let's automate what I just did for all 21k houses using the get_regression_points() function from the moderndive package. You previously saw that this function returns information on each point involved in a regression model. In particular -The 2nd-to-last column log10_price_hat are the fitted/predicted values, as I manually computed earlier for our example house. -The last column consists of the residuals, i.e., the observed log10-price minus the predicted log10-price. Using the residuals, let's compute a measure of the model’s fit, or more precisely speaking, lack thereof. Let's take the 3D scatterplot and regression plane from before and mark a selection of residuals.

7. Best fit and residuals

We plot an arbitrarily chosen set of residuals with red vertical lines. Remember, residuals are the discrepancies between -the observed values marked by the blue dots and -the fitted/predicted values, marked by the corresponding point on the regression plane. These correspond to the epsilon error term in the general modeling framework we saw in Chap 1. Say you compute the residual for all 21k points, square the residuals, and sum them. You saw earlier that this quantity is called the "sum of squared residuals". It is a numerical summary of the "lack-of-fit" of a model to a set of points, in this case the regression plane. Hence, larger values of the sum of squared residuals indicate poorer fit, and smaller values indicate better fit. Just as with "best-fitting" regression lines, of all possible planes the regression plane minimizes the sum of squared residuals. This is what is meant by "best fitting" plane. Let's now compute the sum of squared residuals.

8. Sum of squared residuals

You start with all 21k residuals as shown above. -You then square them using mutate() ... -and then summarize() the squared residuals with their sum ... The resulting value of 585 is hard to interpret in absolute terms. However, in relative terms it can be used to compare fits of different models that use different explanatory variables, and hence allow us to identify which models fit "best". This is a theme we'll revist in Chapter 4 on model assessment and selection.

9. Let's practice!

Your turn. Let's use size and number of bedrooms as predictor variables instead, and start predicting house prices!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.