Trimming strings
In the previous exercise, you were able to identify the correct data type and convert user_birth_year to the correct type, allowing you to extract counts that gave you a bit more insight into the dataset.
Another common dirty data problem is having extra bits like percent signs or periods in numbers, causing them to be read in as characters. In order to be able to crunch these numbers, the extra bits need to be removed and the numbers need to be converted from character to numeric. In this exercise, you'll need to convert the duration column from character to numeric, but before this can happen, the word "minutes" needs to be removed from each value.
dplyr, assertive, and stringr are loaded and bike_share_rides is available.
Cet exercice fait partie du cours
Cleaning Data in R
Instructions
- Use
str_remove()to remove"minutes"from thedurationcolumn ofbike_share_rides. Add this as a new column calledduration_trimmed. - Convert the
duration_trimmedcolumn to a numeric type and add this as a new column calledduration_mins. - Glimpse at
bike_share_ridesand assert that theduration_minscolumn isnumeric. - Calculate the mean of
duration_mins.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
bike_share_rides <- bike_share_rides %>%
# Remove 'minutes' from duration: duration_trimmed
mutate(duration_trimmed = ___,
# Convert duration_trimmed to numeric: duration_mins
duration_mins = ___)
# Glimpse at bike_share_rides
___
# Assert duration_mins is numeric
___
# Calculate mean duration
___