Recoding variables and calculating group sums
dplyr, one of the core packages within the tidyverse, includes numerous functions for data wrangling. This functionality allows users to recode datasets, define groups within those datasets, and perform calculations over those groups. Such operations commonly take place within a pipe, denoted with the %>%
operator.
In this exercise, you'll work with ACS data in just such a tidyverse workflow. You'll be identifying median household income variables in ACS table B19001 that are below $35,000; between $35,000 and $75,000; and above $75,000. You'll then tabulate the number of households that fall into each group for counties in Washington.
This exercise is part of the course
Analyzing US Census Data in R
Exercise instructions
- Filter out rows where the variable is equal to
"B19001_001"
, as this represents the total number of households. - Use the
case_when()
function to generate a column namedincgroup
, which you'll use to define the recoded groups. - Use the
group_by()
function to group your dataset by county name and income group. - Finally, use the
summarize()
function to tabulate group sums by county, then check the result.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Use a tidy workflow to wrangle ACS data
wa_grouped <- wa_income %>%
___(___ != "B19001_001") %>%
mutate(incgroup = ___(
variable < "B19001_008" ~ "below35k",
variable < "B19001_013" ~ "35kto75k",
TRUE ~ "above75k"
)) %>%
___(NAME, incgroup) %>%
___(group_est = sum(estimate))
wa_grouped