Assembling columns

The final stage of data preparation is to consolidate all of the predictor columns into a single column.

An updated version of the flights data, which takes into account all of the changes from the previous few exercises, has the following predictor columns:

mon, dom and dow
carrier_idx (indexed value from carrier)
org_idx (indexed value from org)
km
depart
duration

Note: The truncate=False argument to the show() method prevents data being truncated in the output.

This exercise is part of the course

Machine Learning with PySpark

View Course

Exercise instructions

Import the class which will assemble the predictors.
Create an assembler object that will allow you to merge the predictors columns into a single column.
Use the assembler to generate a new consolidated column.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import the necessary class
from pyspark.ml.feature import ____

# Create an assembler object
assembler = ____(inputCols=[
    ____
], outputCol='features')

# Consolidate predictor columns
flights_assembled = assembler.____(____)

# Check the resulting column
flights_assembled.select('features', 'delay').show(5, truncate=False)

Edit and Run Code