Get startedGet started for free

Recoding variables and calculating group sums

dplyr, one of the core packages within the tidyverse, includes numerous functions for data wrangling. This functionality allows users to recode datasets, define groups within those datasets, and perform calculations over those groups. Such operations commonly take place within a pipe, denoted with the %>% operator.

In this exercise, you'll work with ACS data in just such a tidyverse workflow. You'll be identifying median household income variables in ACS table B19001 that are below $35,000; between $35,000 and $75,000; and above $75,000. You'll then tabulate the number of households that fall into each group for counties in Washington.

This exercise is part of the course

Analyzing US Census Data in R

View Course

Exercise instructions

  • Filter out rows where the variable is equal to "B19001_001", as this represents the total number of households.
  • Use the case_when() function to generate a column named incgroup, which you'll use to define the recoded groups.
  • Use the group_by() function to group your dataset by county name and income group.
  • Finally, use the summarize() function to tabulate group sums by county, then check the result.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Use a tidy workflow to wrangle ACS data
wa_grouped <- wa_income %>%
  ___(___ != "B19001_001") %>%
  mutate(incgroup = ___(
    variable < "B19001_008" ~ "below35k", 
    variable < "B19001_013" ~ "35kto75k", 
    TRUE ~ "above75k"
  )) %>%
  ___(NAME, incgroup) %>%
  ___(group_est = sum(estimate))

wa_grouped
Edit and Run Code