MulaiMulai sekarang secara gratis

Assembling columns

The final stage of data preparation is to consolidate all of the predictor columns into a single column.

An updated version of the flights data, which takes into account all of the changes from the previous few exercises, has the following predictor columns:

  • mon, dom and dow
  • carrier_idx (indexed value from carrier)
  • org_idx (indexed value from org)
  • km
  • depart
  • duration

Note: The truncate=False argument to the show() method prevents data being truncated in the output.

Latihan ini adalah bagian dari kursus

Machine Learning with PySpark

Lihat Kursus

Petunjuk latihan

  • Import the class which will assemble the predictors.
  • Create an assembler object that will allow you to merge the predictors columns into a single column.
  • Use the assembler to generate a new consolidated column.

Latihan interaktif praktis

Cobalah latihan ini dengan menyelesaikan kode contoh berikut.

# Import the necessary class
from pyspark.ml.feature import ____

# Create an assembler object
assembler = ____(inputCols=[
    ____
], outputCol='features')

# Consolidate predictor columns
flights_assembled = assembler.____(____)

# Check the resulting column
flights_assembled.select('features', 'delay').show(5, truncate=False)
Edit dan Jalankan Kode