Session Ready
Exercise

Correcting inconsistency

Now that you've identified that dest_size has whitespace inconsistencies and cleanliness has capitalization inconsistencies, you'll use the new tools at your disposal to fix the inconsistent values in sfo_survey instead of removing the data points entirely, which could add bias to your dataset if more than 5% of the data points need to be dropped.

dplyr and stringr are loaded and sfo_survey is available.

Instructions
100 XP
  • Add a column to sfo_survey called dest_size_trimmed that contains the values in the dest_size column with all leading and trailing whitespace removed.
  • Add another column called cleanliness_lower that contains the values in the cleanliness column converted to all lowercase.
  • Count the number of occurrences of each category in dest_size_trimmed.
  • Count the number of occurrences of each category in cleanliness_lower.