More than 2 explanatory variables

1. More than 2 explanatory variables

Regression models aren't limited to two explanatory variables. Here, we'll consider three of them, and think about what happens when you increase that number even further.

2. From last time

In the last video you saw this scatter plot with the response variable, mass, indicated by color, and the explanatory variables shown on the x and y axes. You can see several distinct clusters of points. Perhaps these correspond to the different species of fish. We can check this by faceting on species.

3. Faceting by species

Giving each species its own panel with facet_wrap makes the groups of data more easily apparent. There is a noticeable strong positive correlation between length and height for each species of fish. The relationship between the explanatory variables and the response is harder to quantify because you can't determine colors as accurately as x and y coordinates, but for each species you can see that as fish get longer and taller they also get heavier. In general, while it is tricky to include more than three numeric variables in a scatter plot, you can include as many categorical variables as you like using faceting. However, more facets can make it harder to see an overall picture. Plotting rapidly becomes difficult as you increase the number of variables to display.

4. Different levels of interaction

By contrast, modeling doesn't get much harder as you increase the number of explanatory variables. Here, there are three explanatory variables and the only change is that you include an extra plus. The main tricky thing about including more explanatory variables in models is that there are more options regarding interactions. This model specifies no interactions between variables. You could also include 2-way, or "pairwise" interactions between the explanatory variables. A third option is to include a 3-way interaction between all the explanatory variables. This syntax quickly becomes cumbersome to type, so fortunately there are shortcuts.

5. All the interactions

You've already seen the concise syntax for including all the interactions. Simply swap the plus operators for times symbols. Both these formulas mean the same thing. You still need a plus before the zero to denote not including a global intercept term.

6. Only 2-way interactions

To get only 2-way interactions in the model, but not the 3-way interaction, you can use a new syntax, namely wrapping the explanatory variables in parentheses and raising them to the power of two. Here, the power of operator has a special meaning; it doesn't square the explanatory variable values. To do that, you need to wrap the terms in the I function, like you saw in the previous course.

7. The prediction flow

The prediction flow with an extra variable contains no surprises. This is exactly what you've seen before. Modeling code scales nicely with more variables.

8. Visualizing predictions

Likewise, the plotting code is the same as before. The colors of the square prediction points are similar to the colors of the nearby circular data points, indicating that the model provides a good fit.

9. Let's practice!

Let's make some models!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.