IniziaInizia gratis

Correcting inconsistency

Now that you've identified that dest_size has whitespace inconsistencies and cleanliness has capitalization inconsistencies, you'll use the new tools at your disposal to fix the inconsistent values in sfo_survey instead of removing the data points entirely, which could add bias to your dataset if more than 5% of the data points need to be dropped.

dplyr and stringr are loaded and sfo_survey is available.

Questo esercizio fa parte del corso

Cleaning Data in R

Visualizza il corso

Istruzioni dell'esercizio

  • Add a column to sfo_survey called dest_size_trimmed that contains the values in the dest_size column with all leading and trailing whitespace removed.
  • Add another column called cleanliness_lower that contains the values in the cleanliness column converted to all lowercase.
  • Count the number of occurrences of each category in dest_size_trimmed.
  • Count the number of occurrences of each category in cleanliness_lower.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Add new columns to sfo_survey
sfo_survey <- sfo_survey %>%
  # dest_size_trimmed: dest_size without whitespace
  mutate(dest_size_trimmed = ___,
         # cleanliness_lower: cleanliness converted to lowercase
         cleanliness_lower = ___)

# Count values of dest_size_trimmed
sfo_survey %>%
  ___

# Count values of cleanliness_lower
sfo_survey %>%
  ___
Modifica ed esegui il codice