Reusing a trainControl

1. Reusing a trainControl

In this chapter, we will work on a more realistic dataset:

2. A real-world example

customer churn at a telecom company. We will work through fitting a couple different predictive models, and then compare them and choose the best one. In order to do a proper apples-to-apples comparison between models, we'll need to explicitly define the training and test folds and make sure each model uses exactly the same split for each fold. We can do this by pre-defining a trainControl object, which explicitly specifies which rows are used for model building and which are used as holdouts. This trainControl object can then be used across multiple models.

3. Example: customer churn data

Before we start modeling, lets load the customer churn data, from the C50 package in R. Then we can summarize the target variable, and find that about 14% of the customers churned.

4. Example: customer churn data

Next, we make train / test indexes for cross validation using caret's createFolds function. Note that these folds preserve the class distribution: the first fold has about a 14% churn rate.

5. Example: customer churn data

Now, we use these folds to create a trainControl object, which we can re-use to fit multiple models. Each model fit with this train control will have exactly the same cross-validation folds. This will allow us to later compare these models and be sure we are making a fair comparison.

6. Let’s practice!

Let's practice making trainControl objects for multiple models.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.