Get startedGet started for free

Evaluating Random Forest

In this final exercise you'll be evaluating the results of cross-validation on a Random Forest model.

The following have already been created:

  • cv - a cross-validator which has already been fit to the training data
  • evaluator — a BinaryClassificationEvaluator object and
  • flights_test — the testing data.

This exercise is part of the course

Machine Learning with PySpark

View Course

Exercise instructions

  • Print a list of average AUC metrics across all models in the parameter grid.
  • Display the average AUC for the best model. This will be the largest AUC in the list.
  • Print an explanation of the maxDepth and featureSubsetStrategy parameters for the best model.
  • Display the AUC for the best model predictions on the testing data.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Average AUC for each parameter combination in grid
print(cv.____)

# Average AUC for the best model
print(____(____))

# What's the optimal parameter value for maxDepth?
print(cv.____.explainParam('____'))
# What's the optimal parameter value for featureSubsetStrategy?
print(cv.____.____(____))

# AUC for best model on testing data
print(evaluator.____(____.____(____)))
Edit and Run Code