MulaiMulai sekarang secara gratis

Splitting & Exploding

Being able to take a compound field like GARAGEDESCRIPTION and massaging it into something useful is an involved process. It's helpful to understand early what value you might gain out of expanding it. In this example, we will convert our string to a list-like array, explode it and then inspect the unique values.

Latihan ini adalah bagian dari kursus

Feature Engineering with PySpark

Lihat Kursus

Petunjuk latihan

  • Import the needed functions split() and explode() from pyspark.sql.functions
  • Use split() to create a new column garage_list by splitting df['GARAGEDESCRIPTION'] on ', ' which is both a comma and a space.
  • Create a new record for each value in the df['garage_list'] using explode() and assign it a new column ex_garage_list
  • Use distinct() to get unique values of ex_garage_list and show the 100 first rows, truncating them at 50 characters to display the values.

Latihan interaktif praktis

Cobalah latihan ini dengan menyelesaikan kode contoh berikut.

# Import needed functions
____ ____ ____ ____, ____

# Convert string to list-like array
df = df.withColumn(____, ____(____, ____))

# Explode the values into new records
ex_df = df.withColumn(____, ____(____))

# Inspect the values
ex_df[['ex_garage_list']].____().____(100, ____=____)
Edit dan Jalankan Kode