Best practices: bar plots

1. Best practices: bar plots

Knowing how to efficiently use ggplot is a good first step,

2. In this chapter

but now we need to consider what is the best way to graphically represent our data, what are the common pitfalls and what is the best way to represent our data.

3. Bar plots

There are two common types of bar plots. The first is simply showing an absolute count value. The second is a distribution, which is as terrible as it is common. Why is that?

4. Mammalian sleep

Let's return to the mammalian sleep data set. We have eating habits of several mammals along with the time they spend sleeping and how much of that time is REM sleep.

5. Dynamite plot

We map vore and sleep onto the x and y aesthetics, respectively, and draw the error bars as discussed in chapter one. So far so good, but we have no idea how many observations we have in each category! This plot also suggests that our data is normally distributed. If our data is not normally distributed it's not appropriate to represent it in this way. A further perceptual problem is that our bars give the impression of having data where there is no data. There for sure are no mammals which sleep 0 hours a day! Yet the bars begin showing data at 0, plus, the region above the mean contains values but no ink! What could be a better way?

6. Individual data points

Well first off, we can simply show the individual data points. This is first off necessary for ourselves, to really see what our data looks like, but it may actually already be a pretty good end point! Note that here we used geom_point and set the position to jitter with the position_jitter function so that we can control the width of the jittering. alpha is also set to 0.6 in case there is any residual over plotting. ok, so now we can start to see some strange patterns in our data set! First off, it's pretty clear that we don't have that much data for Insectivores, and that it anyways looks pretty strange. We can't really say much about them, but if we had to say something it looks like they are bimodal. We'd need some more data to make better conclusions. Omnivores also look pretty interesting, it appears as though this data is positively skewed. So we can start drawing conclusions that were previously impossible to see.

7. geom_errorbar()

Of course we could still plot both the individual data points and the summary statistics with the geom errorbar

8. geom_pointrange()

or the geom pointrange.

9. Without data points

And it's obvious that we could have simply shown these summary statistics by themselves.

10. Bars are not necessary

Notice that the error bars with points are a much cleaner representation of the data. The bars are simply not necessary! Now, none of these summary plots are particularly useful in this specific case, mostly because we know now that the insectivore and omnivore data sets are not suitable. Nonetheless they may be perfectly good alternatives for your data, so they are worth mentioning. There are some more plotting geometries that we'll discuss in the next course when we get into statistical plots, such as box plots and violin plots are. Here I just want to mention that they are also alternatives, in general, but not necessarily in this situation.

11. Ready for exercises!

OK, let's explore these concepts in more detail in the exercises.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.