Extracting Text to New Features
Garages are an important consideration for houses in Minnesota where most people own a car and the snow is annoying to clear off a car parked outside. The type of garage is also important, can you get to your car without braving the cold or not? Let's look at creating a feature has_attached_garage that captures whether the garage is attached to the house or not.
Cet exercice fait partie du cours
Feature Engineering with PySpark
Instructions
- Import the needed function when()frompyspark.sql.functions.
- Create a string matching condition using like()to look for for the string patternAttached Garageindf['GARAGEDESCRIPTION']and use wildcards%so it will match anywhere in the field.
- Similarly, create another condition using like()to find the string patternDetached Garageindf['GARAGEDESCRIPTION']and use wildcards%so it will match anywhere in the field.
- Create a new column has_attached_garageusingwhen()to assign the value 1 if it has an attached garage, zero if detached and useotherwise()to assign null withNoneif it is neither.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Import needed functions
____ ____ ____ ____
# Create boolean conditions for string matches
has_attached_garage = df[____].____(____)
has_detached_garage = df[____].____(____)
# Conditional value assignment 
df = df.withColumn(____, (____(____, 1)
                                          .____(____, 0)
                                          .____(____)))
# Inspect results
df[['GARAGEDESCRIPTION', 'has_attached_garage']].show(truncate=100)