1. Learn
  2. /
  3. Courses
  4. /
  5. Machine Learning with PySpark

Connected

Exercise

Assembling columns

The final stage of data preparation is to consolidate all of the predictor columns into a single column.

An updated version of the flights data, which takes into account all of the changes from the previous few exercises, has the following predictor columns:

  • mon, dom and dow
  • carrier_idx (indexed value from carrier)
  • org_idx (indexed value from org)
  • km
  • depart
  • duration

Note: The truncate=False argument to the show() method prevents data being truncated in the output.

Instructions

100 XP
  • Import the class which will assemble the predictors.
  • Create an assembler object that will allow you to merge the predictors columns into a single column.
  • Use the assembler to generate a new consolidated column.