Get startedGet started for free

Correcting inconsistency

Now that you've identified that dest_size has whitespace inconsistencies and cleanliness has capitalization inconsistencies, you'll use the new tools at your disposal to fix the inconsistent values in sfo_survey instead of removing the data points entirely, which could add bias to your dataset if more than 5% of the data points need to be dropped.

dplyr and stringr are loaded and sfo_survey is available.

This exercise is part of the course

Cleaning Data in R

View Course

Exercise instructions

  • Add a column to sfo_survey called dest_size_trimmed that contains the values in the dest_size column with all leading and trailing whitespace removed.
  • Add another column called cleanliness_lower that contains the values in the cleanliness column converted to all lowercase.
  • Count the number of occurrences of each category in dest_size_trimmed.
  • Count the number of occurrences of each category in cleanliness_lower.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Add new columns to sfo_survey
sfo_survey <- sfo_survey %>%
  # dest_size_trimmed: dest_size without whitespace
  mutate(dest_size_trimmed = ___,
         # cleanliness_lower: cleanliness converted to lowercase
         cleanliness_lower = ___)

# Count values of dest_size_trimmed
sfo_survey %>%
  ___

# Count values of cleanliness_lower
sfo_survey %>%
  ___
Edit and Run Code