Splitting & Exploding
Being able to take a compound field like GARAGEDESCRIPTION
and massaging it into something useful is an involved process. It's helpful to understand early what value you might gain out of expanding it. In this example, we will convert our string to a list-like array, explode it and then inspect the unique values.
Diese Übung ist Teil des Kurses
Feature Engineering with PySpark
Anleitung zur Übung
- Import the needed functions
split()
andexplode()
frompyspark.sql.functions
- Use
split()
to create a new columngarage_list
by splittingdf['GARAGEDESCRIPTION']
on ', ' which is both a comma and a space. - Create a new record for each value in the
df['garage_list']
usingexplode()
and assign it a new columnex_garage_list
- Use
distinct()
to get unique values ofex_garage_list
andshow
the 100 first rows, truncating them at 50 characters to display the values.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
# Import needed functions
____ ____ ____ ____, ____
# Convert string to list-like array
df = df.withColumn(____, ____(____, ____))
# Explode the values into new records
ex_df = df.withColumn(____, ____(____))
# Inspect the values
ex_df[['ex_garage_list']].____().____(100, ____=____)