1. Multiple linear regression
You have seen multiple linear regression in previous courses. And throughout this course, we have discussed inference in the simple linear model. This video brings together those ideas by walking through the interpretation of the inference analysis on the multiple regression model.
2. Bathrooms negative coefficient
Recall from the previous exercise that - somewhat unexpectedly - the coefficient associated with number of bathrooms went from having a positive relationship with home price in the single variable model to having a negative relationship with home price when the size of the home was also included in the model.
The reason for the switch in sign for the coefficients is similar to the example we saw with the coins. Here, for a house that has a given square feet, more bathrooms means that less of the square footage is used for bedrooms and other usable space (thus reflecting a lower average home price).
3. Bathrooms non-significant coefficient
(make p-values red)
Notice also that the coefficient on bathrooms is not significant (and it was significant in the model containing only bathrooms).
The significance changes because the hypothesis changes. In the first model on bathrooms only, the p-value describes the probability of the data if there is no relationship between bathrooms and price.
In the second model, the p-value on bathrooms describes the probability of the data if there is no relationship between bathrooms and price GIVEN THAT SQFT IS IN THE MODEL.
We interpret the last p-value in the second model as information that the bathrooms variable is not needed if square feet is used in the linear model.
4. Price on bed and bath
Notice, that when we regress the log price of the homes on bath and bed (without square feet), both variables are significant. That is, GIVEN BATH IS IN THE MODEL, THE NUMBER OF BEDROOMS IS A SIGNIFICANT PREDICTOR OF PRICE.
Similarly, GIVEN THE NUMBER OF BEDROOMS IS IN THE MODEL, THE NUMBER OF BATHROOMS IS A SIGNIFICANT PREDICTOR OF PRICE.
5. Large model on price
As we saw before, now each p-value is interpreted given ALL THE REMAINING VARIABLES. As we expect, bathrooms is not significant given square feet and bedrooms are in the model.
However, both other variables are significant predictors of log-price. That is, the number of bedrooms is a significant predictor of price, even when square feet and bathrooms are in the model.
Square feet is a significant predictor of log price even when the number of bathrooms and bedrooms are in the model.
6. Let's practice!
We've only used the mathematical model to address significance in the multiple linear regression setting. That's because the permutation test is much harder to implement when working with multiple variables, and it is beyond the scope of this class. And now it's your turn to try some examples.