Create the pipeline
You're finally ready to create a Pipeline
!
Pipeline
is a class in the pyspark.ml
module that combines all the Estimators
and Transformers
that you've already created. This lets you reuse the same modeling process over and over again by wrapping it up in one simple object. Neat, right?
This exercise is part of the course
Foundations of PySpark
Exercise instructions
- Import
Pipeline
frompyspark.ml
. - Call the
Pipeline()
constructor with the keyword argumentstages
to create aPipeline
calledflights_pipe
.stages
should be a list holding all the stages you want your data to go through in the pipeline. Here this is just:[dest_indexer, dest_encoder, carr_indexer, carr_encoder, vec_assembler]
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import Pipeline
from ____ import ____
# Make the pipeline
flights_pipe = Pipeline(stages=____)