Exercise

Dissecting the best flight duration model

You just set up a CrossValidator to find good parameters for the linear regression model predicting flight duration.

The model pipeline has multiple stages (objects of type StringIndexer, OneHotEncoder, VectorAssembler and LinearRegression), which operate in sequence. The stages are available as the stages attribute on the pipeline object. They are represented by a list and the stages are executed in the sequence in which they appear in the list.

Now you're going to take a closer look at the pipeline, split out the stages and use it to make predictions on the testing data.

The following objects have already been created:

  • cv — a trained CrossValidatorModel object and
  • evaluator — a RegressionEvaluator object.

The flights data have been randomly split into flights_train and flights_test.

Instructions

100 XP
  • Retrieve the best model.
  • Look at the stages in the best model.
  • Isolate the linear regression stage and extract its parameters.
  • Use the best model to generate predictions on the testing data and calculate the RMSE.