Customizing trainControl
As you saw in the video, area under the ROC curve is a very useful, single-number summary of a model's ability to discriminate the positive from the negative class (e.g. mines from rocks). An AUC of 0.5 is no better than random guessing, an AUC of 1.0 is a perfectly predictive model, and an AUC of 0.0 is perfectly anti-predictive (which rarely happens).
This is often a much more useful metric than simply ranking models by their accuracy at a set threshold, as different models might require different calibration steps (looking at a confusion matrix at each step) to find the optimal classification threshold for that model.
You can use the trainControl()
function in caret
to use AUC (instead of acccuracy), to tune the parameters of your models. The twoClassSummary()
convenience function allows you to do this easily.
When using twoClassSummary()
, be sure to always include the argument classProbs = TRUE
or your model will throw an error! (You cannot calculate AUC with just class predictions. You need to have class probabilities as well.)
This exercise is part of the course
Machine Learning with caret in R
Exercise instructions
- Customize the
trainControl
object to usetwoClassSummary
rather thandefaultSummary
. - Use 10-fold cross-validation.
- Be sure to tell
trainControl()
to return class probabilities.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create trainControl object: myControl
myControl <- trainControl(
method = "cv",
number = ___,
summaryFunction = defaultSummary,
classProbs = ___, # IMPORTANT!
verboseIter = TRUE
)