Evaluating Random Forest
In this final exercise you'll be evaluating the results of cross-validation on a Random Forest model.
The following have already been created:
cv
- a cross-validator which has already been fit to the training dataevaluator
— aBinaryClassificationEvaluator
object andflights_test
— the testing data.
This exercise is part of the course
Machine Learning with PySpark
Exercise instructions
- Print a list of average AUC metrics across all models in the parameter grid.
- Display the average AUC for the best model. This will be the largest AUC in the list.
- Print an explanation of the
maxDepth
andfeatureSubsetStrategy
parameters for the best model. - Display the AUC for the best model predictions on the testing data.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Average AUC for each parameter combination in grid
print(cv.____)
# Average AUC for the best model
print(____(____))
# What's the optimal parameter value for maxDepth?
print(cv.____.explainParam('____'))
# What's the optimal parameter value for featureSubsetStrategy?
print(cv.____.____(____))
# AUC for best model on testing data
print(evaluator.____(____.____(____)))