Session Ready
Exercise

Engineering features from strings - extraction

The Length column in the hiking dataset is a column of strings, but contained in the column is the mileage for the hike. We're going to extract this mileage using regular expressions, and then use a lambda in Pandas to apply the extraction to the DataFrame.

Instructions
100 XP
  • Create a pattern that will extract numbers and decimals from text, using \d+ to get numbers and \. to get decimals, and pass it into re's compile function.
  • Use re's match function to search the text, passing in the pattern and the length text.
  • Use the matched mile's group() attribute to extract the matched pattern, making sure to match group 0, and pass it into float.
  • Apply the return_mileage() function to the hiking["Length"] column.