1. Tables and summary variables in tidycensus
In the first chapter of this course, you got acquainted with tidycensus, an R package for working with data from the US Census Bureau's decennial Census and American Community Survey. In this chapter, you'll learn about some additional features of tidycensus, and how to integrate American Community Survey data into a data analysis workflow using functions from the tidyverse.
2. Tables in the ACS
The get_acs() and get_decennial() functions in tidycensus both include an optional argument, table, that lets you request all of the variables from a table at once. If the table argument is supplied, you don't have to specify any variable IDs.
In this example, we've requested the table B19001 for counties in Washington.
3. Summary variables in tidycensus
Many variables in the decennial Census or ACS are useful as denominators for other variables. Often, you'll want to normalize your data by such denominators in your data analysis, calculating percentages or proportions for cross-group comparisons. This example shows how to request a summary variable of the total population, which would be useful as a denominator for race and ethnicity variables.
The total population values for each county are stored in the summary_est column, with an associated summary_moe margin of error if applicable. This allows for group estimates in the estimate column to be compared with the total population, in this case, in a given geographic area.
4. Calculating percentages
As we've requested total population as a denominator with summary_var, we can now use the summary_est column to calculate group percentages and check the result. This allows for more reasonable comparison across counties in Texas that might have very different baseline populations. This is important for Texas, especially as Texas has the 3rd-largest as well as the 2nd, 3rd, and 4th-smallest counties by population in the US!.
We use the mutate() function here to generate a new column, pct for percentage, that stores the group percentage. The select function then retains the county name, variable ID, and our new percentage column.
5. Let's practice!
Let's try this out in R with some exercises.