Assembling columns
The final stage of data preparation is to consolidate all of the predictor columns into a single column.
An updated version of the flights
data, which takes into account all of the changes from the previous few exercises, has the following predictor columns:
mon
,dom
anddow
carrier_idx
(indexed value fromcarrier
)org_idx
(indexed value fromorg
)km
depart
duration
Note: The truncate=False
argument to the show()
method prevents data being truncated in the output.
This exercise is part of the course
Machine Learning with PySpark
Exercise instructions
- Import the class which will assemble the predictors.
- Create an assembler object that will allow you to merge the predictors columns into a single column.
- Use the assembler to generate a new consolidated column.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the necessary class
from pyspark.ml.feature import ____
# Create an assembler object
assembler = ____(inputCols=[
____
], outputCol='features')
# Consolidate predictor columns
flights_assembled = assembler.____(____)
# Check the resulting column
flights_assembled.select('features', 'delay').show(5, truncate=False)