Beeswarms and violins
1. Boxplot alternatives
Just adding points behind your boxplots can get you far in your visualizations but there are some scenarios where you may need to think outside the box.2. Limitations of the boxplot with jitter
Jittering is fantastic to give a general sense of the quantity of data or maybe bands of common values, but trying to assess more subtle traits like skew and bimodality can be tricky with the overlap and constant width.3. What are some other options?
So what if boxplots plus jitter isn't cutting it for your data? Say you have too many data points to get a sense of anything beyond a very rough idea of density? There are plenty of options, all with their unique strong-points and flaws but in this lesson, we will touch on two: the beeswarm and violin plot.4. Beeswarm plots
Beeswarm plots might sound terrifying, but, unlike a real beeswarm, you should embrace these plots when you have a medium amount of observations and want to get a sense of density among your groups. They are super easy to drop-in where you would have used a boxplot geometry with geom_beeswarm in the ggbeeswarm package.5. Beeswarm plots
Beeswarms are a 'smart' version of a jitter plot where all the points will try and sit on a bisecting line, however, the points aren't allowed to overlap, which causes denser regions to 'stack' on each other, revealing the underlying distributional shape.6. Beeswarm pros
Beeswarms are great for a few reasons. As mentioned before, they give a good idea of the distributional shape of the underlying data with thicker areas having a higher density. Also, they show each-and-every data point to the viewer which, as we've seen, is always a good idea when possible.7. Beeswarm cons
Alas, beeswarms are not perfect. They don't work well with huge amounts of data. Because the points can't overlap, anytime you have a lot of data to display you will run into the points being so small that they are practically useless. Another con comes in their stacking methodology. Which point gets drawn first, and thus the point stacking, is rather arbitrary. Poor placement of points can cause artificially inflated widths of the plot. Always check your plot to make sure the stacking seems dense and reasonable!8. Violin plots
Another alternative to the boxplot is the violin plot. Violin plots are just kernel density plots like we learned about in the last chapter, except they are symmetric about the categorical axis. To make them we can call geom_violin where we would otherwise call geom_boxplot.9. Violin plots
Their symmetry, in theory, allows them to be more efficiently stacked next to each other for comparisons.10. Violin pros
The pros of the violin plot follow those of a KDE. They allow every data point to 'have its voice heard' in a less arbitrary manner than beeswarm plots. This is because, unlike the beeswarm plots, there are no stacking order biases. They also handle large amounts of data better than beeswarm plots or boxplots with jitter because you aren't actually _showing_ every single data point.11. Violin cons
The cons again follow much from those of a standard KDE. The width of the kernel you apply is important. Each plot will require you to balance the demands from the amount of data you have and meaningful unit-widths. Unlike with a standard KDE, it is rather hard to use a rugplot or similar encoding to represent the underlying data powering the density estimate without cluttering the plot. This is especially true when you want to overlay the violin plot with some standard summary statistics like a boxplot.12. Let's try some more advanced comparisons!
Now that we've discussed the alternatives to the boxplot, let's make some ourselves.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.