Cross-validation data frames
Now that you have withheld a portion of your data as testing data, you can use the remaining portion to find the best performing model.
In this exercise, you will split the training data into a series of 5 train-validate sets using the vfold_cv() function from the rsample package.
Cet exercice fait partie du cours
Machine Learning in the Tidyverse
Instructions
- Build a data frame for 5-fold cross validation from the
training_datausingvfold_cv()and assign it tocv_split. - Prepare
cv_databy appending two new columns tocv_split:train: containing the train data frames by mappingtraining()across thesplitscolumn.validate: containing the validate data frames by using mappingtesting()across thesplitscolumn.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
set.seed(42)
# Prepare the data frame containing the cross validation partitions
cv_split <- vfold_cv(___, v = ___)
cv_data <- cv_split %>%
mutate(
# Extract the train data frame for each split
train = map(___, ~___(.x)),
# Extract the validate data frame for each split
validate = map(___, ~___(.x))
)
# Use head() to preview cv_data
head(cv_data)