Preserving the most common levels
Sometimes you don't want to keep levels by proportion but instead the most common n
levels. Let's see how the resulting levels kept for MLMethodNextYearSelect
changes when we kept by number instead of proportion. multiple_choice_responses
has been loaded for you.
This exercise is part of the course
Categorical Data in the Tidyverse
Exercise instructions
- Remove people who didn't select a method.
- Create a new variable,
ml_method
, fromMLMethodNextYearSelect
that preserves 5 most common titles and lumps the rest as "other method" using the argumentother_level
. - Count the frequency of each
ml_method
, sorting in descending order.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
multiple_choice_responses %>%
# Remove NAs
filter(___) %>%
# Create ml_method, retaining the 5 most common methods and renaming others "other method"
mutate(ml_method = ___(MLMethodNextYearSelect, ___, other_level = ___)) %>%
# Count the frequency of your new variable, sorted in descending order
___(ml_method, ___)