CommencerCommencer gratuitement

Extracting string patterns

The Length column in the hiking dataset is a column of strings, but contained in the column is the mileage for the hike. We're going to extract this mileage using regular expressions, and then use a lambda in pandas to apply the extraction to the DataFrame.

Cet exercice fait partie du cours

Preprocessing for Machine Learning in Python

Afficher le cours

Instructions

  • Search the text in the length argument for numbers and decimals using an appropriate pattern.
  • Extract the matched pattern and convert it to a float.
  • Apply the return_mileage() function to each row in the hiking["Length"] column.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Write a pattern to extract numbers and decimals
def return_mileage(length):
    
    # Search the text for matches
    mile = re.____(____, ____)
    
    # If a value is returned, use group(0) to return the found value
    if mile is not None:
        return float(____)
        
# Apply the function to the Length column and take a look at both columns
hiking["Length_num"] = ____.apply(____)
print(hiking[["Length", "Length_num"]].head())
Modifier et exécuter le code