Assembling columns
The final stage of data preparation is to consolidate all of the predictor columns into a single column.
An updated version of the flights data, which takes into account all of the changes from the previous few exercises, has the following predictor columns:
mon,domanddowcarrier_idx(indexed value fromcarrier)org_idx(indexed value fromorg)kmdepartduration
Note: The truncate=False argument to the show() method prevents data being truncated in the output.
This exercise is part of the course
Machine Learning with PySpark
Exercise instructions
- Import the class which will assemble the predictors.
- Create an assembler object that will allow you to merge the predictors columns into a single column.
- Use the assembler to generate a new consolidated column.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the necessary class
from pyspark.ml.feature import ____
# Create an assembler object
assembler = ____(inputCols=[
____
], outputCol='features')
# Consolidate predictor columns
flights_assembled = assembler.____(____)
# Check the resulting column
flights_assembled.select('features', 'delay').show(5, truncate=False)