Aan de slagGa gratis aan de slag

Extracting Text to New Features

Garages are an important consideration for houses in Minnesota where most people own a car and the snow is annoying to clear off a car parked outside. The type of garage is also important, can you get to your car without braving the cold or not? Let's look at creating a feature has_attached_garage that captures whether the garage is attached to the house or not.

Deze oefening maakt deel uit van de cursus

Feature Engineering with PySpark

Cursus bekijken

Oefeninstructies

  • Import the needed function when() from pyspark.sql.functions.
  • Create a string matching condition using like() to look for for the string pattern Attached Garage in df['GARAGEDESCRIPTION'] and use wildcards % so it will match anywhere in the field.
  • Similarly, create another condition using like() to find the string pattern Detached Garage in df['GARAGEDESCRIPTION'] and use wildcards % so it will match anywhere in the field.
  • Create a new column has_attached_garage using when() to assign the value 1 if it has an attached garage, zero if detached and use otherwise() to assign null with None if it is neither.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Import needed functions
____ ____ ____ ____

# Create boolean conditions for string matches
has_attached_garage = df[____].____(____)
has_detached_garage = df[____].____(____)

# Conditional value assignment 
df = df.withColumn(____, (____(____, 1)
                                          .____(____, 0)
                                          .____(____)))

# Inspect results
df[['GARAGEDESCRIPTION', 'has_attached_garage']].show(truncate=100)
Code bewerken en uitvoeren