Get startedGet started for free

Predicting house price using size & condition

1. Predicting house price using size & condition

Just as with multiple regression models using two numerical predictor variables, let's now predict house prices using models with one numerical and one categorical predictor variable. Previously, however you made predictions on the same data you used to fit the model to. So you had access to the true observed sale price of these houses. Now let's consider a different scenario: making predictions on "new data". This scenario could be thought of as the following: let's say the market hasn't changed much and a completely new house is put on the market. When you make the prediction, you won't know how good your prediction is since you don't know the true observed sale price yet. It's only after the house is sold that you can compare the prediction to the truth.

2. Refresher: Parallel slopes

Recall our plot of the parallel slopes model, where for each of the 5 levels of condition, you plot a separate regression line of log10_price over log10_size. While these 5 lines shared a common slope, they had different intercepts, with the houses of condition 5 having the highest intercept.

3. Making a prediction

Say two new houses enter the market. One is of condition 3 and log10_size 2.9 as marked with the green dashed line on the left. The other is of condition 4 and log10_size 3.6 as marked with the blue dashed line on the right. What are this model's predicted log10 prices? They're the points at which the blue and green dashed lines intersect their corresponding regression lines!

4. Visualizing predictions

For the first house marked with the green dashed line, you'd predict a log10_price of about 5.4, for a sale price of 10^5.4 or about $250K. For the other house marked with the blue dashed line, you'd predict a log10_price of just under 6, for a sale price of just under 10^6 = one million dollars. Great! Now instead of just making visual predictions, let's also make explicit numerical ones.

5. Numerical predictions

Recall our regression table, where the intercept corresponds to the baseline group condition 1, the common slope associated with log10_size, and the 4 offsets. The predicted log10_price for the first house is: the intercept 2.88 plus the offset for condition 3 houses 0.032 plus 0.837 times the house’s log10_size 2.9. The predicted sale price for the second house is similarly the intercept 2.88 plus the offset for condition 4 houses 0.0440 plus 0.837 times the house’s log10_size 3.6. *While doing this by hand is fine for two new houses, imagine doing this for 1000 new houses! This would take us forever! Fortunately, if the new houses' information is saved in a spreadsheet or data frame, you can automate the above procedure!

6. Defining "new" data

Let's represent these two new houses in a manually created data frame new_houses using the data_frame() function from the dplyr package. Observe. New houses has two rows corresponding to our two new houses. And has two variables whose names and formats match exactly what they are in the original dataframe house-prices. For example, the variable condition was saved as a categorical variable in the original data frame house-prices, so in new-houses we convert the condition values 3 & 4 from numerical to categorical variables using the factor function.

7. Making predictions using new data

You can once again use the get_regression_points() function to automate the prediction process, but this time I'll use a new argument. We set newdata to the dataframe new_houses, indicating that I want to apply the fitted model to a new set of observations. Observe that the output contains the same predicted values log10_price_hat just like before.

8. Making predictions using new data

Now say you want to obtain predictions of price instead of log10_price. You use the mutate() function to raise 10 to the power the variable log10_price_hat to obtain price_hat. Our predicted house prices are about $220K and $870K respectively, as done visually earlier.

9. Let's practice!

For our final set of exercises for Chapter 3 on multiple regression, you'll now similarly make your own predictions on a set of "new" houses.