Exercise

# AUC computation

Say you have a binary classifier that in fact is just randomly making guesses. It would be correct approximately 50% of the time, and the resulting ROC curve would be a diagonal line in which the True Positive Rate and False Positive Rate are always equal. The Area under this ROC curve would be 0.5. This is one way in which the AUC, which Hugo discussed in the video, is an informative metric to evaluate a model. If the AUC is greater than 0.5, the model is better than random guessing. Always a good sign!

In this exercise, you'll calculate AUC scores using the `roc_auc_score()`

function from `sklearn.metrics`

as well as by performing cross-validation on the diabetes dataset.

`X`

and `y`

, along with training and test sets `X_train`

, `X_test`

, `y_train`

, `y_test`

, have been pre-loaded for you, and a logistic regression classifier `logreg`

has been fit to the training data.

Instructions

**100 XP**

- Import
`roc_auc_score`

from`sklearn.metrics`

and`cross_val_score`

from`sklearn.model_selection`

. - Using the
`logreg`

classifier, which has been fit to the training data, compute the predicted probabilities of the labels of the test set`X_test`

. Save the result as`y_pred_prob`

. - Compute the AUC score using the
`roc_auc_score()`

function, the test set labels`y_test`

, and the predicted probabilities`y_pred_prob`

. - Compute the AUC scores by performing 5-fold cross-validation. Use the
`cross_val_score()`

function and specify the`scoring`

parameter to be`'roc_auc'`

.