Get startedGet started for free

Customizing trainControl

As you saw in the video, area under the ROC curve is a very useful, single-number summary of a model's ability to discriminate the positive from the negative class (e.g. mines from rocks). An AUC of 0.5 is no better than random guessing, an AUC of 1.0 is a perfectly predictive model, and an AUC of 0.0 is perfectly anti-predictive (which rarely happens).

This is often a much more useful metric than simply ranking models by their accuracy at a set threshold, as different models might require different calibration steps (looking at a confusion matrix at each step) to find the optimal classification threshold for that model.

You can use the trainControl() function in caret to use AUC (instead of acccuracy), to tune the parameters of your models. The twoClassSummary() convenience function allows you to do this easily.

When using twoClassSummary(), be sure to always include the argument classProbs = TRUE or your model will throw an error! (You cannot calculate AUC with just class predictions. You need to have class probabilities as well.)

This exercise is part of the course

Machine Learning with caret in R

View Course

Exercise instructions

  • Customize the trainControl object to use twoClassSummary rather than defaultSummary.
  • Use 10-fold cross-validation.
  • Be sure to tell trainControl() to return class probabilities.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create trainControl object: myControl
myControl <- trainControl(
  method = "cv",
  number = ___,
  summaryFunction = defaultSummary,
  classProbs = ___, # IMPORTANT!
  verboseIter = TRUE
)
Edit and Run Code