Get startedGet started for free

More than two explanatory variables

1. More than two explanatory variables

Regression models aren't limited to two explanatory variables. Here, we'll consider three of them, and think about what happens when you increase that number even further.

2. From last time

In the last video you saw this scatter plot with the response variable, mass, indicated by color, and the explanatory variables shown on the x and y axes. You can see several distinct clusters of points. Perhaps these correspond to the different species of fish. We can check this by faceting on species.

3. Faceting by species

You can give each species its own panel with the FacetGrid function from Seaborn. In a first step, you prepare the grid by specifying the layout: the col argument allows you to split by species, and the col_wrap argument makes it a two by two grid. The palette argument can be added optionally to improve the coloring. The second step is to map what visualization you want to plot on the grid. In this case, you specify a scatter plot, with length and height on the x- and y-axis, respectively. There is a noticeable strong positive correlation between length and height for each species of fish. The relationship between the explanatory variables and the response is harder to quantify because you can't determine colors as accurately as x and y coordinates. In this example, brighter colors mean heavier fish, so for each species you can see that as fish get longer and taller they also get heavier.

4. Faceting by species

In general, while it is tricky to include more than three numeric variables in a scatter plot, you can include as many categorical variables as you like using faceting. However, more facets can make it harder to see an overall picture. Plotting rapidly becomes more challenging as you increase the number of variables to display.

5. Different levels of interaction

By contrast, modeling doesn't get much harder as you increase the number of explanatory variables. Here, there are three explanatory variables and the only change is that you include an extra plus. The main tricky thing about including more explanatory variables in models is that there are more options regarding interactions. This model specifies no interactions between variables. You could also include 2-way, or "pairwise" interactions between the explanatory variables. A third option is to include a 3-way interaction between all the explanatory variables. This syntax quickly becomes cumbersome to type, so fortunately there are shortcuts.

6. All the interactions

You've already seen the concise syntax for including all the interactions. Simply swap the plus operators for times symbols. Both these formulas mean the same thing. You still need a plus before the zero to denote not including a global intercept term.

7. Only two-way interactions

To get only 2-way interactions in the model, but not the 3-way interaction, you can use a new syntax, namely wrapping the explanatory variables in parentheses and raising them to the power of two with two asterisks.

8. The prediction flow

The prediction flow with an extra variable contains no surprises. This builds upon what you've seen before: product can easily take additional arrays as arguments. Modeling code scales nicely with more variables. Notice however how rapidly the dimensions of the prediction dataset increase to account for all combinations. Visualizing these predictions isn't as useful anymore since it reaches the limit of visual interpretation, so we stick with predicting the response variable instead.

9. Let's practice!

Time for you to make some models!