Get startedGet started for free

The facets layer

1. The facets layer

The next layer we'll take a look at is the the facets layer.

2. Facets

Facets are a pretty straight-forward and very useful tool in data visualization. They are based on the concept of small multiples, popularized by Edward Tufte in his 1983 book the Visualization of Quantitative Information.

3. The idea behind facets

The idea is that we can split up a large, complex plot, to produce multiple smaller plots

4. The idea behind facets

that have the exact same coordinate system. On each plot we present different data sets so as to compare them more easily.

5. iris.wide

Here we use the iris.wide data frame to produce a scatter plot. We have information on each of the species, so we can add a facet_grid layer to add another variable to our plot.

6. iris.wide & facet_grid()

facet_grid has arguments for splitting plots into rows or columns according to a variable provided in the var function. Here, we're splitting according to columns using the cols argument.

7. Formula notation

The same result can be achieved by using the formula notation which you may be familiar with from defining linear models with the lm function. Everything on the left of the tilde (~) will split according to rows, and everything on the right will split according to columns. So the primary use of facets is to add another categorical variable to your plot, but they also aid in visual perception.

8. iris.wide2

For example, we already saw where this is very useful in the first course, when we talked about data structure. We used the iris.wide2 data frame to produce three different plots. The issue was that each plot was drawn on a separate y-axis, and we had to use three different plotting functions to get these plots.

9. iris.tidy

When we use the iris.tidy data set, we can take advantage of the facet_grid layer to solve both of theses problems. The trick in both of these examples is to understand that facets are simply splitting up our overall data set according to the levels in a categorical factor variable. If our data is in the right format we can achieve this easily using either columns or rows.

10. iris.tidy done wrong

In this case it doesn't make sense to split according to rows because the whole point is to aid in visual perception and to make comparisons. When we split along columns, it allowed us to read our plots from left to right along a single axis.

11. Other options

So the choice depends on what type of data you are using. Of course you can also split according to both columns and rows, using two different variables, and if you have many levels in your categorical variable, you can wrap the subplots into a defined number of columns.

12. Let's practice!

Alright, let's explore these details and some further subtleties in the exercises.