1. Bars and dots: point data
In this chapter, we are going to switch from talking about proportions to point, or single observation, data.
2. What is point data?
What is point data? For the sake of simplicity, in this chapter, we will be referring to situations where you have some numerical observation for multiple categories.
The numerical aspect of the observation can be a wide variety of things. It could be counts, like the number of cases of a given disease in our WHO dataset or they could be rates, percentages, etc. All that matters is we have a single number to associate to each category or class.
3. A single observation
The point of a single observation is important. If you have multiple observations for each class then you most likely want to use a technique for visualizing distributions. A topic we will cover in the next chapter. For right now, we will stick to single observations.
4. The bar chart
The most common visualization type used for dealing with point data is the Bar Chart. Open any data-driven report, paper, or blog-post and you're almost guaranteed to see a bar chart. Understandably then, they are super easy to make in ggplot...
Just map an x and y aesthetic and use geom_col.
5. The bar chart
This bar chart shows the total number of cases for each disease in our dataset. We can easily see that measles is the largest in magnitude, followed by pertussis. A way to make this plot better would be to order the bars, which we will learn how to do in a bit, but even without this, we can see that diphtheria's bar is taller than polio's; an impressive amount of precision.
6. Not always the best
While pie charts may be _more_ useful than they may get credit for, bar charts are the opposite. They are great and accurate charts, but frequently when they are used they are not the correct chart for the task.
7. The stacking principle
Bar charts should be used to represent data that have some sort of accumulating property to them.
I like to ask myself if I can imagine stacking units of the measure on top of each other. For instance, money spent on a project can be thought of as a stack of coins.
If, however, your data don't make sense stacked, such as percent change in profit or odds of failure, a bar chart is not appropriate.
8. Why quantities?
While bars do an excellent job of providing the viewer with an accurate representation of a given value, they come with some subtle perception issues.
In a 2012 paper investigating the perceptive properties of bar charts, researchers found that people couldn't help perceiving values below the top of the bar as 'included' in the bar and values above the bar as _not_ included.
So when visualizing your data with bars, make sure your values fit this perception. For example, 24 dollars is contained within 28 dollars of spending. This is also why the axis of a bar chart should always start at zero!
9. A big deal?
There are many issues in the world larger than the misuse of bar charts in data visualization.
Ultimately they are still fantastically valuable tools for representing a given numeric value with high precision. If your audience simply is familiar with a bar chart, it's not a big deal if you use one where it may not be 100% appropriate.
However, if you are intelligent about your use of bar charts it will help you give the most accurate impression of your data to your audience, whether that be others or yourself.
10. Let's practice!
Let's refresh our bar plot skills with some proper use-cases on our WHO data.