Aan de slagGa gratis aan de slag

Assembling columns

The final stage of data preparation is to consolidate all of the predictor columns into a single column.

An updated version of the flights data, which takes into account all of the changes from the previous few exercises, has the following predictor columns:

  • mon, dom and dow
  • carrier_idx (indexed value from carrier)
  • org_idx (indexed value from org)
  • km
  • depart
  • duration

Note: The truncate=False argument to the show() method prevents data being truncated in the output.

Deze oefening maakt deel uit van de cursus

Machine Learning with PySpark

Cursus bekijken

Oefeninstructies

  • Import the class which will assemble the predictors.
  • Create an assembler object that will allow you to merge the predictors columns into a single column.
  • Use the assembler to generate a new consolidated column.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Import the necessary class
from pyspark.ml.feature import ____

# Create an assembler object
assembler = ____(inputCols=[
    ____
], outputCol='features')

# Consolidate predictor columns
flights_assembled = assembler.____(____)

# Check the resulting column
flights_assembled.select('features', 'delay').show(5, truncate=False)
Code bewerken en uitvoeren