Get startedGet started for free

Distribution of one variable

1. Distribution of one variable

You might not have noticed, but already you've been creating plots that illustrate the relationship between two variables in your dataset. It's a bit unusual to lead with this, but it gets you thinking early about the multivariate structure that is found in most real datasets. Now, let's zoom in on working with just a single variable.

2. Marginal distribution

To compute a table of counts for a single variable like id, just provide vector into into the table function by the sole argument. One way to think of what we've done is to take the original two-way table and then, sum the cells across each level of align. Since we've summed over the margins of the other variables, this is sometimes known as a marginal distribution.

3. Simple bar chart

The syntax to create the simple bar chart is straightforward as well, just remove the fill equals align argument.

4. Faceting

Another useful way to form the distribution of a single variable is to condition on a particular value of another variable. We might be interested, for example, in the distribution of id for all neutral characters. We could either filter the dataset and build a bar chart using only cases where alignment was neutral, or we could use a technique called faceting. Faceting breaks the data into subsets based on the levels of a categorical variable and then constructs a plot for each.

5. Faceted bar charts

To implement this in ggplot2, we just need to add a faceting layer: the facet wrap function, then a tilde, which can be read as "broken down by" and then our variable "align". The result is three simple bar charts side-by-side, the first one corresponding to the distribution of id within all cases that have a bad alignment, and so on, for good and neutral alignments. If this plot feels familiar, it should.

6. Faceting vs. stacking

In essence, it's a rearrangement of the stacked bar charts that we considered at the beginning of the chapter.

7. Faceting vs. stacking

Each facet in the plot on the left corresponds to a single stacked bar in the plot on the right. They allow you to get a sense the distribution of a single variable,

8. Faceting vs. stacking

by looking at a single facet or a single stacked bar or

9. Faceting vs. stacking

the association between the variables, by looking across facets or across stacked bars.

10. Faceting vs. stacking

A discussion of plots for categorical data wouldn't be complete without some mention of the pie chart.

11. Pie chart vs. bar chart

The pie chart is a common way to display categorical data where the size of the slice corresponds to the proportion of cases that are in that level. Here is a pie chart for the identity variable and it looks pleasing enough. The problem with pie charts, though, is that it can be difficult to assess the relative size of the slices. Here, is the green public slice or the grey NA slice bigger?

12. Pie chart vs. bar chart

If we represent this data using a bar chart the answer is obvious:

13. Pie chart vs. bar chart

the proportion of public is greater. For that reason, it's generally a good idea to stick to bar charts.

14. Let's practice!

Ok, now it's your turn to practice with simple bar charts and faceting.