Get startedGet started for free

Cross-validation data frames

Now that you have withheld a portion of your data as testing data, you can use the remaining portion to find the best performing model.

In this exercise, you will split the training data into a series of 5 train-validate sets using the vfold_cv() function from the rsample package.

This exercise is part of the course

Machine Learning in the Tidyverse

View Course

Exercise instructions

  • Build a data frame for 5-fold cross validation from the training_data using vfold_cv() and assign it to cv_split.
  • Prepare cv_data by appending two new columns to cv_split:
    • train: containing the train data frames by mapping training() across the splits column.
    • validate: containing the validate data frames by using mapping testing() across the splits column.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

set.seed(42)

# Prepare the data frame containing the cross validation partitions
cv_split <- vfold_cv(___, v = ___)

cv_data <- cv_split %>% 
  mutate(
    # Extract the train data frame for each split
    train = map(___, ~___(.x)), 
    # Extract the validate data frame for each split
    validate = map(___, ~___(.x))
  )

# Use head() to preview cv_data
head(cv_data)
Edit and Run Code