Cross-validation data frames
Now that you have withheld a portion of your data as testing data, you can use the remaining portion to find the best performing model.
In this exercise, you will split the training data into a series of 5 train-validate sets using the vfold_cv()
function from the rsample
package.
This exercise is part of the course
Machine Learning in the Tidyverse
Exercise instructions
- Build a data frame for 5-fold cross validation from the
training_data
usingvfold_cv()
and assign it tocv_split
. - Prepare
cv_data
by appending two new columns tocv_split
:train
: containing the train data frames by mappingtraining()
across thesplits
column.validate
: containing the validate data frames by using mappingtesting()
across thesplits
column.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
set.seed(42)
# Prepare the data frame containing the cross validation partitions
cv_split <- vfold_cv(___, v = ___)
cv_data <- cv_split %>%
mutate(
# Extract the train data frame for each split
train = map(___, ~___(.x)),
# Extract the validate data frame for each split
validate = map(___, ~___(.x))
)
# Use head() to preview cv_data
head(cv_data)