Customizing trainControl

As you saw in the video, area under the ROC curve is a very useful, single-number summary of a model's ability to discriminate the positive from the negative class (e.g. mines from rocks). An AUC of 0.5 is no better than random guessing, an AUC of 1.0 is a perfectly predictive model, and an AUC of 0.0 is perfectly anti-predictive (which rarely happens).

This is often a much more useful metric than simply ranking models by their accuracy at a set threshold, as different models might require different calibration steps (looking at a confusion matrix at each step) to find the optimal classification threshold for that model.

You can use the trainControl() function in caret to use AUC (instead of acccuracy), to tune the parameters of your models. The twoClassSummary() convenience function allows you to do this easily.

When using twoClassSummary(), be sure to always include the argument classProbs = TRUE or your model will throw an error! (You cannot calculate AUC with just class predictions. You need to have class probabilities as well.)

Customize the trainControl object to use twoClassSummary rather than defaultSummary.
Use 10-fold cross-validation.
Be sure to tell trainControl() to return class probabilities.

Regression Models: Fitting and Evaluating Their Performance

Classification Models: Fitting and Evaluating Their Performance

Tuning Model Parameters to Improve Performance

Preprocessing Data

Selecting Models: A Case Study in Churn Prediction

Exercise

Customizing trainControl

Instructions