1. Two numeric explanatory variables
In the previous chapters, the models had one numeric and one categorical explanatory variable. Let's see what changes if you have two numeric explanatory variables instead.
2. Visualizing three numeric variables
Two numeric explanatory variables plus a numeric response variable gives three numeric variables to plot. Since scatter plots are designed to show relationships between two numeric variables, it takes more thinking about. There are two common choices.
Either we draw a 3D scatter plot, or a 2D scatter plot, using color for the response variable.
3. Another column for the fish dataset
Let's revisit the fish dataset, which now has an extra numeric column, the height of the fish in centimeters.
4. 3D scatter plot
There are multiple options for drawing a 3D plot using Python, but this is beyond the scope of this course. The main issue here is that the plot is impossible to interpret.
The fundamental problem is that screens are two dimensional, so 3D plots always suffer perspective issues. The only way to circumvent this is to create an interactive plot that the audience can rotate to explore the data from different angles.
Maybe virtual reality will also be useful to solve this one day, but for now let's move on to the next type of plot.
5. 2D scatter plot, color for response
The next plot type to explore uses color for the response variable. This is a standard 2D scatter plot so we can use seaborn, setting the hue argument to mass_g. Interpretation wise, it is an improvement over the 3D scatter plot. As you move up and to the right on the plot, the colors get darker, representing heavier fish.
6. Modeling with two numeric explanatory variables
Modeling with an extra explanatory variable isn't much different than what we've seen previously. The explanatory variables on the right of the formula are separated with a plus, as before.
You get a global intercept coefficient, and one slope coefficient for each explanatory variable.
7. The prediction flow
The prediction flow is no different. Create a DataFrame of explanatory values with product from itertools, then add a column of predictions with assign and predict.
8. Plotting the predictions
The plotting code also remains largely the same. I create two scatter plots: one with the actual data points, and one with the prediction data points. To avoid duplication, the legend in one of the scatter plot calls can be removed. For clarity, I also changed the prediction data point markers to squares with the marker argument.
The results look like this. The color grid gives a nice overview of how the response variable changes over the plane of the explanatory variables. The heaviest fish are in the top-right, where they are long and tall.
9. Including an interaction
To include an interaction in the model, the only change is to replace the plus in the formula with a times. This gives you one extra slope term for the effect of the interaction between the two explanatory variables.
10. The prediction flow with an interaction
The prediction flow is exactly the same as before. Repetitive, but pleasingly easy. The only thing that has changed is the name of the model.
11. Plotting the predictions
The plotting code is identical, but the colors on the plot are slightly different.
In this case, the colors of the square prediction points closely match the colors of the nearby circular data points, which is a nice visual indicator that the model is a good fit.
12. Let's practice!
Time to apply all this on the housing dataset.