Summing strings and concatenating numbers
In the previous exercise, you were able to identify that category is the correct data type for user_type and convert it in order to extract relevant statistical summaries that shed light on the distribution of user_type.
Another common data type problem is importing what should be numerical values as strings, as mathematical operations such as summing and multiplication lead to string concatenation, not numerical outputs.
In this exercise, you'll be converting the string column duration to the type int. Before that however, you will need to make sure to strip "minutes" from the column in order to make sure pandas reads it as numerical. The pandas package has been imported as pd.
This exercise is part of the course
Cleaning Data in Python
Exercise instructions
- Use the
.strip()method to stripdurationof"minutes"and store it in theduration_trimcolumn. - Convert
duration_trimtointand store it in theduration_timecolumn. - Write an
assertstatement that checks ifduration_time's data type is now anint. - Print the average ride duration.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Strip duration of minutes
ride_sharing['duration_trim'] = ride_sharing['duration'].____.____()
# Convert duration to integer
ride_sharing['duration_time'] = ____
# Write an assert statement making sure of conversion
assert ride_sharing['____'].____ == '____'
# Print formed columns and calculate average ride duration
print(ride_sharing[['duration','duration_trim','duration_time']])
print(____)