Session Ready
Exercise

Preparing your data set for further analysis

Your data still looks a bit messy so it's time to clean it up with data manipulation techniques. You will do this using dplyr, an easy-to-use R package for performing the most common data manipulation tasks.

dplyr makes use of the pipe operator: %>%. Pipes take the output from one function and feed it to the first argument of the next function. You can even chain operations. The following two commands do the same thing:

tail(head(ac_survey, 20), 5)
ac_survey %>% head(20) %>% tail(5)

This command "takes `acsurvey, then applies thehead()function with the optional argument 20, then appliestail()` with the optional argument 5."_

ac_survey that you imported in the previous exercise is already available, as is the small data frame degree_codes.

Instructions
100 XP
  • Use a chain of piping operators on ac_survey to build ac_survey_clean:
    • First, convert ac_survey into a tbl with tbl_df() (already coded for you).
    • Next, remove observations that have NA values with na.omit().
    • Then, keep observations for which SCHL %in% c(21, 22, 24) with filter().
    • Finally, inner_join() with degree_codes to add a more understandable Degree column.
  • Print out ac_survey_clean and inspect the result of your work.