Trimming strings
In the previous exercise, you were able to identify the correct data type and convert user_birth_year
to the correct type, allowing you to extract counts that gave you a bit more insight into the dataset.
Another common dirty data problem is having extra bits like percent signs or periods in numbers, causing them to be read in as character
s. In order to be able to crunch these numbers, the extra bits need to be removed and the numbers need to be converted from character
to numeric
. In this exercise, you'll need to convert the duration
column from character
to numeric
, but before this can happen, the word "minutes"
needs to be removed from each value.
dplyr
, assertive
, and stringr
are loaded and bike_share_rides
is available.
This exercise is part of the course
Cleaning Data in R
Exercise instructions
- Use
str_remove()
to remove"minutes"
from theduration
column ofbike_share_rides
. Add this as a new column calledduration_trimmed
. - Convert the
duration_trimmed
column to a numeric type and add this as a new column calledduration_mins
. - Glimpse at
bike_share_rides
and assert that theduration_mins
column isnumeric
. - Calculate the mean of
duration_mins
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
bike_share_rides <- bike_share_rides %>%
# Remove 'minutes' from duration: duration_trimmed
mutate(duration_trimmed = ___,
# Convert duration_trimmed to numeric: duration_mins
duration_mins = ___)
# Glimpse at bike_share_rides
___
# Assert duration_mins is numeric
___
# Calculate mean duration
___