1. Searching for data with tidycensus
To use tidycensus functions, users must supply a vector of Census variable IDs. In this lesson, we'll discuss how to find and use Census variable IDs, and learn about how they are formatted.
2. Searching for Census variables
There are thousands of variables available across the American Community Survey and decennial Censuses! This can make it difficult to figure out how to find the variables you need. Fortunately, there are web resources to assist with this, like Census Reporter's intuitive search tools. Aside from web resources, tidycensus also includes some built-in functionality to search for variables.
3. Choosing a dataset to search
The load_variables() function in tidycensus helps users download and browse variables datasets from the Census Bureau website. The function has three parameters. Year refers to the year or end year of the dataset; dataset refers to the dataset in question, which in the example shown here is acs5, for 5-year ACS; and the optional cache parameter allows users to store the variables dataset on their computer to speed up future browsing.
4. Filtering a variables dataset
Once acquired, Census or ACS variables datasets can be explored with tidyverse tools. Datasets returned by load_variables() have three columns: name, for the Census ID code, label, for a description of the variable's characteristics, and concept, which refers to the general group to which the variable corresponds. In the example shown here, the downloaded dataset is filtered for variables within Census table B19001, which covers household income.
5. ACS variable structure
Understanding variable ID codes from the ACS can be confusing, so let's go through it here for the variable B19001_002E, which refers to the number of households with an income in the past 12 months less than $10,000. The B prefix refers to the fact that this variable comes from a base table, which gives the most detail available in the ACS. Other available tables in tidycensus include collapsed tables, denoted by C; data profiles, denoted by DP; and subject tables, denoted by S.
The component 19001 refers to the table ID. In this case, the variable belongs to a table of related variables, which cover different household income bands. 002 then refers to the specific variable ID within that table. The suffix E refers to estimate and is not required by tidycensus functions. Almost every variable in the ACS is characterized by a margin of error, and tidycensus is designed to return both the estimate and margin of error by default. Margin of error variables have the suffix M; you'll only see these suffixes when returning data in wide format.
6. Let's practice!
Let's get some practice searching for variables with tidycensus.