1. Comparing models
After fitting 2 (or more models), the next step is deciding which one makes the best predictions on new data.
2. Comparing models
First of all, we have to make sure they were fit on the exact same training and test sets during cross-validation, so we're sure that we're making an apples-to-apples comparison of their results.
We want to pick the model with the highest average AUC across all 10 folds, but also typically want a model with a low standard deviation in AUC.
Fortunately, the caret package provides a handy function for collecting the results from multiple models. This function is called "resamples" and provides a variety of methods for assessing which of 2 models is the best for a given dataset.
3. Example: resamples() on churn data
Let's use the resamples function to compare our glmnet and random forest models on the churn dataset.
First, we make a list of models, and name each one for future reference.
Next, we collect all the results from all the different cross-validation folds using the resamples function.
4. Summarize the results
Finally, we can summarize the results using the summary function on the resamples object, and choose which model is the best on this dataset.
5. Let’s practice!
Let's practice with the resamples function.