Correcting inconsistency

Now that you've identified that dest_size has whitespace inconsistencies and cleanliness has capitalization inconsistencies, you'll use the new tools at your disposal to fix the inconsistent values in sfo_survey instead of removing the data points entirely, which could add bias to your dataset if more than 5% of the data points need to be dropped.

dplyr and stringr are loaded and sfo_survey is available.

Add a column to sfo_survey called dest_size_trimmed that contains the values in the dest_size column with all leading and trailing whitespace removed.
Add another column called cleanliness_lower that contains the values in the cleanliness column converted to all lowercase.
Count the number of occurrences of each category in dest_size_trimmed.
Count the number of occurrences of each category in cleanliness_lower.

Common Data Problems

Categorical and Text Data

Advanced Data Problems

Record Linkage

Exercise

Correcting inconsistency

Instructions