Get startedGet started for free

Assembling columns

The final stage of data preparation is to consolidate all of the predictor columns into a single column.

An updated version of the flights data, which takes into account all of the changes from the previous few exercises, has the following predictor columns:

  • mon, dom and dow
  • carrier_idx (indexed value from carrier)
  • org_idx (indexed value from org)
  • km
  • depart
  • duration

Note: The truncate=False argument to the show() method prevents data being truncated in the output.

This exercise is part of the course

Machine Learning with PySpark

View Course

Exercise instructions

  • Import the class which will assemble the predictors.
  • Create an assembler object that will allow you to merge the predictors columns into a single column.
  • Use the assembler to generate a new consolidated column.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import the necessary class
from pyspark.ml.feature import ____

# Create an assembler object
assembler = ____(inputCols=[
    ____
], outputCol='features')

# Consolidate predictor columns
flights_assembled = assembler.____(____)

# Check the resulting column
flights_assembled.select('features', 'delay').show(5, truncate=False)
Edit and Run Code