Two numeric explanatory variables
1. Two numeric explanatory variables
In the previous chapters, the models had one numeric and one categorical explanatory variable. Let's see what changes if you have two numeric explanatory variables instead.2. Visualizing 3 numeric variables
Two numeric explanatory variables plus a numeric response variable gives three numeric variables to plot. Since scatter plots are designed to show relationships between two numeric variables, it takes more thinking about. There are two common choices. Either we draw a 3D scatter plot, or a 2D scatter plot, using color for the response variable.3. Another column for the fish dataset
Let's revisit the fish dataset, which now has an extra numeric column, the height of the fish in centimeters.4. 3D scatter plot
To make a 3D scatter plot, you can use scatter3D from the plot3D package. Unlike ggplot, this simply requires three numeric vectors for the x, y, and z coordinates. Writing the name of the dataset and using the dollar operator to access columns three times is tedious, so for code like this I prefer magrittr's dollar pipe. This code does the same thing, but is easier to write and read. The dollar pipe treats each of the arguments in the following function call as though they are columns in the data frame given to the left of the pipe.5. 3D scatter plot
The code is nicer now, but there is a bigger problem, namely that the plot is impossible to interpret. Some cleanup work like labeling axes can make things easier, but the fundamental problem is that screens are two dimensional, so 3D plots always suffer perspective issues. The only way to circumvent this is to create an interactive plot that the audience can rotate to explore the data from different angles. Maybe virtual reality will solve this one day, but for now let's move on to the next type of plot.6. 2D scatter plot, color for response
The next plot type to explore uses color for the response variable. This is a standard 2D scatter plot so we can use ggplot2. It's an improvement, but ggplot's default color scale doesn't make it easy to pick out differences in blues.7. Viridis color scales
ggplot has a set of color scales called "viridis" that provide easier to distinguish colors. scale_color_viridis_c is used for continuous scales, where you have numeric data. This plot uses the "inferno" palette option, which moves from black through blue and red to yellow. As you move up and to the right in the plot, the colors get brighter, representing heavier fish.8. Modeling with 2 numeric explanatory variables
Although plotting was harder with this extra explanatory variable, modeling isn't much different. The explanatory variables on the right of the formula are separated with a plus, as before. You get a global intercept coefficient, and one slope coefficient for each explanatory variable.9. The prediction flow
The prediction flow is no different. Create a grid of explanatory values with expand_grid, then add a column of predictions with mutate and predict.10. Plotting the predictions
The plotting code also remains the same, though the results look different. The color grid gives a nice overview of how the response variable changes over the plane of the explanatory variables. The heaviest fish are in the top-right, where they are long and tall.11. Including an interaction
To include an interaction in the model, the only change is to replace the plus in the formula with a times. This gives you one extra slope term for the effect of the interaction between the two explanatory variables.12. The prediction flow again
The prediction flow is exactly the same as before. Boring, but pleasingly easy.13. Plotting the predictions
The plotting code is identical, but the colors on the plot are slightly different. In this case, the colors of the square prediction points closely match the colors of the nearby circular data points, which is a nice visual indicator that the model is a good fit.14. Let's practice!
Back to the housing dataset.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.