Visualizing margins of error from the ACS

1. Visualizing margins of error in the ACS

In the previous lesson, we covered how to work with ACS margins of error using tidycensus. Beyond this, data visualization is a powerful tool for communicating the uncertainty in ACS estimates.

2. Visualizing uncertainty

A common approach to the visualization of uncertainty in estimates is the error bar plot. A dot might represent the location of a point estimate, and the width of the error bars then represents the range of potential values around that dot. In the mock example shown on the screen, point estimates with minimal uncertainty have narrow ranges, whereas others with significant uncertainty have very wide ranges.

3. Margin of error plots in ggplot2

In ggplot2, analysts can generate error bars with the geom_errorbar() and geom_errorbarh() functions. These functions work particularly well with tidycensus, given that tidycensus returns estimate and moe columns by default. In the example shown on the slide, error bars are created by specifying a range between the ACS estimate minus, and plus, the margin of error.

4. Margin of error plots in ggplot2

Here, we see the result. As shown, however, the plot has a number of problems. Given that the data points are not sorted, the graphic looks like a fleet of tie fighters from Star Wars! While the error bars do represent uncertainty, the relationships between estimates are difficult to parse. Additionally, the chart includes redundant information. Each county name includes the string "County, Wyoming" after it. Given that this is information common to each county, it is taking up unnecessary space on the plot. Let's use what we've learned about data visualization to clean this plot up.

5. Formatting margin of error plots

Some key elements of data processing and plot formatting will resolve these errors. First, we can generate a new, modified dataset in which we've used the str_replace() function from the stringr package to remove the redundant information from the county name column. In the ggplot() call, we'll make sure to use reorder() to order the dots on the plot by their estimates. Within the labs() function, we can then specify a title that says "counties in Wyoming"; this retains important contextual information for the plot viewer, but puts it in a single place on the plot rather than repeating it for each county name along the y-axis tick labels. We'll also increase the base font size to improve the plot's legibility.

6. Formatted margin of error plot

The result is a much cleaner and more legible plot. One additional benefit of sorting the dots along the y-axis by value is that we can get a sense of how uncertainty impacts comparisons between estimates. For example, we can observe that while Weston County is one of the older counties as measured by median age in Wyoming, its precise rank is subject to uncertainty, as indicated by the error bars that overlap other county estimates. Conversely, we can be confident that Albany County is the youngest county in Wyoming; Albany County is home to the University of Wyoming.

7. Let's practice!

Now, it's time to try out creating margin of error plots for yourself!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.