1. Interactive scatterplot matrices
In the previous lessons, you have considered approaches to visualize relationships between three or four variables using color, size, shape, and subplots. In this lesson, we'll extend these ideas to create scatterplot matrices to explore either quantitative or categorical data.
2. Wine data
In this lesson, we'll return to the wine dataset considered in chapters 1 and 2. Recall that the dataset consists of 13 chemical measurements on three types of wine.
3. The scatterplot matrix
As a first example of a scatterplot matrix, let's consider exploring the pairwise relationships between alcohol content, flavonoids, and color.
The diagonal of the plot displays the same variable on both axes, so will always be perfectly linear, and is often omitted for this reason.
Looking at the off-diagonals, we quickly see a potentially nonlinear relationship between flavonoids and alcohol, while color is positively associated with alcohol content. We also see an interesting clustering in the plot of flavonoids against color.
4. The template
To create this scatterplot matrix in plotly, we use the following template:
Notice that we no longer specify how we map variables to the plotting canvas in the plot_ly() function. Instead, we pass add_trace() a list of dimensions specifying which variables should be included in the plot.
For each variable, we need to specify two arguments: a string providing the axis label and a mapping specifying what variable in the dataset corresponds the plotted values.
Finally, notice that we must specify that we are using the "splom" trace type, which is an acronym for scatterplot matrix.
5. Wine SPLOM
Returning to our example, we initialize the plot using the plot_ly() function and add the "splom" trace using the add_trace() function.
To create a scatterplot matrix for Alcohol, Flavonoids, and Color, we pass the dimensions argument a list with the axis label and column name for each variable. Notice that each element of the list passed to the dimensions argument must also be a list.
6. Linked brushing
The panels of a scatterplot matrix in plotly are linked. This means that there is a connection between each panel such that a change to one of the panels is reflected in the other panels.
To better understand this, consider what happens when we use the lasso select tool to highlight the lower cluster in the plot of flavonoids against color. Notice that the selected points are highlighted in every panel.
Be sure to explore how you can interact with these plots. Specifically, check out how the pan, zoom, and selection tools work.
7. Adding color
As we saw in chapter 2, using color to represent the wine type helped explain some of the structure observed in the scatterplot of Alcohol content against Flavonoids. Let's use color to add wine type to this scatterplot matrix.
Since we want to map wine type to color in each dimension of the scatterplot matrix, we map Type to color in the plot_ly() layer.
8. Adding color
After adding wine type to the plot, we quickly see that wine type explains the nonlinear relationship between Alcohol content and Flavonoids, and the clustering between flavonoids and color.
9. Let's practice!
Now that you've seen how to create a scatterplot matrix, its time to practice.