Get startedGet started for free

Extracting string patterns

The Length column in the hiking dataset is a column of strings, but contained in the column is the mileage for the hike. We're going to extract this mileage using regular expressions, and then use a lambda in pandas to apply the extraction to the DataFrame.

This exercise is part of the course

Preprocessing for Machine Learning in Python

View Course

Exercise instructions

  • Search the text in the length argument for numbers and decimals using an appropriate pattern.
  • Extract the matched pattern and convert it to a float.
  • Apply the return_mileage() function to each row in the hiking["Length"] column.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Write a pattern to extract numbers and decimals
def return_mileage(length):
    
    # Search the text for matches
    mile = re.____(____, ____)
    
    # If a value is returned, use group(0) to return the found value
    if mile is not None:
        return float(____)
        
# Apply the function to the Length column and take a look at both columns
hiking["Length_num"] = ____.apply(____)
print(hiking[["Length", "Length_num"]].head())
Edit and Run Code