Summing strings and concatenating numbers
In the previous exercise, you were able to identify that category
is the correct data type for user_type
and convert it in order to extract relevant statistical summaries that shed light on the distribution of user_type
.
Another common data type problem is importing what should be numerical values as strings, as mathematical operations such as summing and multiplication lead to string concatenation, not numerical outputs.
In this exercise, you'll be converting the string column duration
to the type int
. Before that however, you will need to make sure to strip "minutes"
from the column in order to make sure pandas
reads it as numerical. The pandas
package has been imported as pd
.
This exercise is part of the course
Cleaning Data in Python
Exercise instructions
- Use the
.strip()
method to stripduration
of"minutes"
and store it in theduration_trim
column. - Convert
duration_trim
toint
and store it in theduration_time
column. - Write an
assert
statement that checks ifduration_time
's data type is now anint
. - Print the average ride duration.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Strip duration of minutes
ride_sharing['duration_trim'] = ride_sharing['duration'].____.____()
# Convert duration to integer
ride_sharing['duration_time'] = ____
# Write an assert statement making sure of conversion
assert ride_sharing['____'].____ == '____'
# Print formed columns and calculate average ride duration
print(ride_sharing[['duration','duration_trim','duration_time']])
print(____)