Cross-validation data frames

Now that you have withheld a portion of your data as testing data, you can use the remaining portion to find the best performing model.

In this exercise, you will split the training data into a series of 5 train-validate sets using the vfold_cv() function from the rsample package.

Cet exercice fait partie du cours

Machine Learning in the Tidyverse

Afficher le cours

Instructions

Build a data frame for 5-fold cross validation from the training_data using vfold_cv() and assign it to cv_split.
Prepare cv_data by appending two new columns to cv_split:
- train: containing the train data frames by mapping training() across the splits column.
- validate: containing the validate data frames by using mapping testing() across the splits column.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

set.seed(42)

# Prepare the data frame containing the cross validation partitions
cv_split <- vfold_cv(___, v = ___)

cv_data <- cv_split %>% 
  mutate(
    # Extract the train data frame for each split
    train = map(___, ~___(.x)), 
    # Extract the validate data frame for each split
    validate = map(___, ~___(.x))
  )

# Use head() to preview cv_data
head(cv_data)

Modifier et exécuter le code