Evaluating Random Forest

In this final exercise you'll be evaluating the results of cross-validation on a Random Forest model.

The following have already been created:

cv - a cross-validator which has already been fit to the training data
evaluator — a BinaryClassificationEvaluator object and
flights_test — the testing data.

Deze oefening maakt deel uit van de cursus

Machine Learning with PySpark

Cursus bekijken

Oefeninstructies

Print a list of average AUC metrics across all models in the parameter grid.
Display the average AUC for the best model. This will be the largest AUC in the list.
Print an explanation of the maxDepth and featureSubsetStrategy parameters for the best model.
Display the AUC for the best model predictions on the testing data.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Average AUC for each parameter combination in grid
print(cv.____)

# Average AUC for the best model
print(____(____))

# What's the optimal parameter value for maxDepth?
print(cv.____.explainParam('____'))
# What's the optimal parameter value for featureSubsetStrategy?
print(cv.____.____(____))

# AUC for best model on testing data
print(evaluator.____(____.____(____)))

Code bewerken en uitvoeren