Extracting string patterns
The Length
column in the hiking
dataset is a column of strings, but contained in the column is the mileage for the hike. We're going to extract this mileage using regular expressions, and then use a lambda in pandas to apply the extraction to the DataFrame.
This exercise is part of the course
Preprocessing for Machine Learning in Python
Exercise instructions
- Search the text in the
length
argument for numbers and decimals using an appropriate pattern. - Extract the matched pattern and convert it to a float.
- Apply the
return_mileage()
function to each row in thehiking["Length"]
column.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Write a pattern to extract numbers and decimals
def return_mileage(length):
# Search the text for matches
mile = re.____(____, ____)
# If a value is returned, use group(0) to return the found value
if mile is not None:
return float(____)
# Apply the function to the Length column and take a look at both columns
hiking["Length_num"] = ____.apply(____)
print(hiking[["Length", "Length_num"]].head())