Trimming strings
In the previous exercise, you were able to identify the correct data type and convert user_birth_year to the correct type, allowing you to extract counts that gave you a bit more insight into the dataset.
Another common dirty data problem is having extra bits like percent signs or periods in numbers, causing them to be read in as characters. In order to be able to crunch these numbers, the extra bits need to be removed and the numbers need to be converted from character to numeric. In this exercise, you'll need to convert the duration column from character to numeric, but before this can happen, the word "minutes" needs to be removed from each value.
dplyr, assertive, and stringr are loaded and bike_share_rides is available.
This exercise is part of the course
Cleaning Data in R
Exercise instructions
- Use
str_remove()to remove"minutes"from thedurationcolumn ofbike_share_rides. Add this as a new column calledduration_trimmed. - Convert the
duration_trimmedcolumn to a numeric type and add this as a new column calledduration_mins. - Glimpse at
bike_share_ridesand assert that theduration_minscolumn isnumeric. - Calculate the mean of
duration_mins.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
bike_share_rides <- bike_share_rides %>%
# Remove 'minutes' from duration: duration_trimmed
mutate(duration_trimmed = ___,
# Convert duration_trimmed to numeric: duration_mins
duration_mins = ___)
# Glimpse at bike_share_rides
___
# Assert duration_mins is numeric
___
# Calculate mean duration
___