10-fold cross-validation
As you saw in the video, a better approach to validating models is to use multiple systematic test sets, rather than a single random train/test split. Fortunately, the caret
package makes this very easy to do:
model <- train(y ~ ., my_data)
caret
supports many types of cross-validation, and you can specify which type of cross-validation and the number of cross-validation folds with the trainControl()
function, which you pass to the trControl
argument in train()
:
model <- train(
y ~ .,
my_data,
method = "lm",
trControl = trainControl(
method = "cv",
number = 10,
verboseIter = TRUE
)
)
It's important to note that you pass the method for modeling to the main train()
function and the method for cross-validation to the trainControl()
function.
This exercise is part of the course
Machine Learning with caret in R
Exercise instructions
- Fit a linear regression to model
price
using all other variables in thediamonds
dataset as predictors. Use thetrain()
function and 10-fold cross-validation. (Note that we've taken a subset of the fulldiamonds
dataset to speed up this operation, but it's still nameddiamonds
.) - Print the model to the console and examine the results.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Fit lm model using 10-fold CV: model
model <- train(
___,
___,
method = "lm",
trControl = trainControl(
method = "cv",
number = ___,
verboseIter = TRUE
)
)
# Print model to console