Correcting inconsistency
Now that you've identified that dest_size has whitespace inconsistencies and cleanliness has capitalization inconsistencies, you'll use the new tools at your disposal to fix the inconsistent values in sfo_survey instead of removing the data points entirely, which could add bias to your dataset if more than 5% of the data points need to be dropped.
dplyr and stringr are loaded and sfo_survey is available.
Cet exercice fait partie du cours
Cleaning Data in R
Instructions
- Add a column to
sfo_surveycalleddest_size_trimmedthat contains the values in thedest_sizecolumn with all leading and trailing whitespace removed. - Add another column called
cleanliness_lowerthat contains the values in thecleanlinesscolumn converted to all lowercase. - Count the number of occurrences of each category in
dest_size_trimmed. - Count the number of occurrences of each category in
cleanliness_lower.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Add new columns to sfo_survey
sfo_survey <- sfo_survey %>%
# dest_size_trimmed: dest_size without whitespace
mutate(dest_size_trimmed = ___,
# cleanliness_lower: cleanliness converted to lowercase
cleanliness_lower = ___)
# Count values of dest_size_trimmed
sfo_survey %>%
___
# Count values of cleanliness_lower
sfo_survey %>%
___