Evaluate the optimal tree
In this exercise, you'll evaluate the test set ROC AUC score of grid_dt
's optimal model.
In order to do so, you will first determine the probability of obtaining the positive label for each test set observation. You can use the methodpredict_proba()
of an sklearn classifier to compute a 2D array containing the probabilities of the negative and positive class-labels respectively along columns.
The dataset is already loaded and processed for you (numerical features are standardized); it is split into 80% train and 20% test. X_test
, y_test
are available in your workspace. In addition, we have also loaded the trained GridSearchCV
object grid_dt
that you instantiated in the previous exercise. Note that grid_dt
was trained as follows:
grid_dt.fit(X_train, y_train)
This exercise is part of the course
Machine Learning with Tree-Based Models in Python
Exercise instructions
Import
roc_auc_score
fromsklearn.metrics
.Extract the
.best_estimator_
attribute fromgrid_dt
and assign it tobest_model
.Predict the test set probabilities of obtaining the positive class
y_pred_proba
.Compute the test set ROC AUC score
test_roc_auc
ofbest_model
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import roc_auc_score from sklearn.metrics
____
# Extract the best estimator
best_model = ____
# Predict the test set probabilities of the positive class
y_pred_proba = ____
# Compute test_roc_auc
test_roc_auc = ____
# Print test_roc_auc
print('Test set ROC AUC score: {:.3f}'.format(test_roc_auc))