1. Heatmaps use case scenario
I briefly mentioned before that heat maps are a poor data visualization method. It's surprising, since they are really popular among scientists of all stripes. Let's take a closer look.
2. The barley dataset
Our dataset is a classic in data visualization. We have four variables: The yield of 10 different barley varieties measured at two years and at 10 different sites.
3. A basic heat map
Our heat map is basically a colored table. Our three categorical variables are mapped onto x, y and fill. The continuous variable, yield, is mapped onto fill.
So what's wrong with this plot?
Well, although there are some exceptions, color on a continuous scale is problematic. Color perception depends on context. Here, each color appears in a different background, which means that heat maps are not well-suited for seeing individual results.
If our categories were clustered in a way the brings out overall trends, then we may make the case for a heat map, because it would at least communicate something. Often times this is not the case.
Many times, heat maps look complex and try to impress the viewer with a meaningless "wow" factor.
What would be a better alternative?
4. A dot plot
Here, we can switch the mapping of yield onto the x and year onto the color scale.
Now we can ask very pointed questions, such as which variety performed best in a given year. How does a particular variety perform at a given site? To answer these questions, we use a process of slow table-look up type perception. It's slow and time-consuming, but very useful.
We can also start to see some trends. First, red, 1931, is mostly greater than blue, 1932. Second, the farms are arranged from lowest overall average, Grand Rapids, to highest, Waseca. Third, we can also notice a difference in spread among the farms. These large overall trends are discernible from this visualization, but they take a bit of time to see.
How about an alternative?
5. As a time series
Typically, when you have a time scale, the key question is change over time. How do yields differ between the two years?
This line plot shows that change for each variety over time. It has all the same information as the previous plot but it's more difficult to answer the precise questions from before. However, it is an easier way to see the general trends in the data set. We've increased the speed of our perception.
Notice that there are 10 colors for the 10 varieties. It's getting pretty difficult to distinguish all the colors and we're at the limits of visual perception.
It still kind-of works, but it's starting to push it.
6. Using dodged error bars
We can aggregate all the varieties by using their mean, and focus on the farms. We saw how to do this in the stat_summary section. Here I've used errorbars with some dodging.
7. Using ribbons for error
Alternatively, we could have also used ribbons without dodging. Both dodged error bars and overlapping ribbons work for showing uncertainty, the choice depends on the density of your data and your audience.
In summary, there are many good alternatives to heat maps, depending on the research question and our take-home message.
8. Coding Time!
Let's explore how to produce these plots in the exercises.