CommencerCommencer gratuitement

Extracting Text to New Features

Garages are an important consideration for houses in Minnesota where most people own a car and the snow is annoying to clear off a car parked outside. The type of garage is also important, can you get to your car without braving the cold or not? Let's look at creating a feature has_attached_garage that captures whether the garage is attached to the house or not.

Cet exercice fait partie du cours

Feature Engineering with PySpark

Afficher le cours

Instructions

  • Import the needed function when() from pyspark.sql.functions.
  • Create a string matching condition using like() to look for for the string pattern Attached Garage in df['GARAGEDESCRIPTION'] and use wildcards % so it will match anywhere in the field.
  • Similarly, create another condition using like() to find the string pattern Detached Garage in df['GARAGEDESCRIPTION'] and use wildcards % so it will match anywhere in the field.
  • Create a new column has_attached_garage using when() to assign the value 1 if it has an attached garage, zero if detached and use otherwise() to assign null with None if it is neither.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Import needed functions
____ ____ ____ ____

# Create boolean conditions for string matches
has_attached_garage = df[____].____(____)
has_detached_garage = df[____].____(____)

# Conditional value assignment 
df = df.withColumn(____, (____(____, 1)
                                          .____(____, 0)
                                          .____(____)))

# Inspect results
df[['GARAGEDESCRIPTION', 'has_attached_garage']].show(truncate=100)
Modifier et exécuter le code