1. Faceting plotly graphics
Creating a series of subplots is another powerful way to explore the impact of additional variables.
2. 2016 video game sales
We'll use subplots to explore the vgsales2016 dataset, which contains information about video games released in 2016, including sales and ratings. The full dataset can be found on Kaggle.
3. Representing many categories
In chapter 2, you created a scatterplot exploring the relationship between user and critic scores, where color represented genre. There are 11 video game genres in 2016, making it difficult to choose a color palette that makes visual comparison easy. In fact, if you run this code in your console, you'll receive a warning message that the default palette only has 8 colors.
To overcome this difficulty we can create multiple graphs of the dataset — one for each category — arranged in a series. These are often called small multiples, trellis graphics, faceted graphics, or subplots.
4. A single subplot
To understand how subplots are constructed, let’s create a plot for only the action genre.
First, we extract the rows corresponding to action games using filter(Genre == "Action"). This new data frame, action_df, only contains 178 observations.
5. A single subplot
Next, we pipe action_df into our scatterplot code to plot the 178 observations.
6. Two subplots
Now, let's consider how to create a series of subplots.
When there are a small number of subplots this can be done manually by storing each plot as an object and combining them using the subplot() function.
For example, we can store the action scatterplot we just created as p1 and create another scatterplot for adventure games, storing this plot as p2.
Finally, we use the subplot function to combine p1 and p2 into a grid of plots. Here we specify nrows = 1 to produce a single row of plots.
Notice that when we create subplots, the default legend simply gives numeric labels to the traces, which is extremely uninformative.
7. Legends
To add an informative legend to the subplot, we map genre to the name of the trace. Now we have Action and Adventure in the legend, providing far more information.
8. Axis labels
Now that we have an informative legend, it's time to add informative axis labels.
If the subplots share the same x- and y-axes, as they do in faceted plots, then adding shareX = TRUE and shareY = TRUE to the subplot() command will add axis labels for the variable names.
Additionally, sharing an axis allows interactivity to be linked. For example, if we zoom in on the y-axis on the left plot, the y-axis will be restricted on both plots.
If this isn't the desired behavior, then you use the titleX and titleY arguments to specify axis titles instead of setting shareX and shareY to TRUE.
9. Iterate to automate
Manually creating facets is tedious and, as with all copy-and-paste solutions, is error-prone. A better approach is to automate this split and plot procedure.
Let's see how to create this faceted scatterplot of user score against critic score for all genres using tidyverse tools.
10. Iterate to automate
To begin, we load the tidyverse because we need tools from the dplyr, tidyr, and purrr packages.
Then, we pipe the data set into `group_by()` to create subsets for each region and `nest()` the results.
This produces a tibble with a column for Genre and a column called data that houses the remaining data for each genre.
11. Iterate to automate
Next, we add a plot column to store the plotly objects for each Genre using mutate.
To create a plotly object for each Genre, we use the `map2()` function from the purrr package. Map2 allows us to iterate over two arguments. In our example it iterates over the elements of the Genre and data columns in the tibble. For each genre-data pair, it then creates a plotly object.
To create this plotly object, we write an anonymous function that takes genre and data as inputs. As of R 4.1, we can write such an anonymous function in shorthand which you see here. \(data, Genre) begins the definition of the anonymous function and we then use our typical plotly code as the body of the function. To be more verbose you can write the word function in place of the slash, but I show the slash usage here to align with the help files.
12. Iterate to automate
Finally, we pipe the results into subplot() to produce the graphic, specifying nrows = 2 to produce a grid of subplots with two rows.
13. Let's practice!
Now it's time for you to practice what you've learned.