Create a cross validation plan
There are several ways to implement an n-fold cross validation plan. In this exercise, you will
create such a plan using vtreat::kWayCrossValidation()
, and examine it.
kWayCrossValidation()
creates a cross validation plan with the following call:
splitPlan <- kWayCrossValidation(nRows, nSplits, dframe, y)
where nRows
is the number of rows of data to be split, and nSplits
is the desired number of cross-validation folds.
Strictly speaking, dframe
and y
aren't used by kWayCrossValidation
; they are there for compatibility with other vtreat
data partitioning functions. You can set them both to NULL
.
The resulting splitPlan
is a list of nSplits
elements; each element contains two vectors:
train
: the indices ofdframe
that will form the training setapp
: the indices ofdframe
that will form the test (or application) set
In this exercise, you will create a 3-fold cross-validation plan for the dataset mpg
.
This exercise is part of the course
Supervised Learning in R: Regression
Exercise instructions
- Load the package
vtreat
. - Get the number of rows in
mpg
and assign it to the variablenRows
. - Call
kWayCrossValidation
to create a 3-fold cross validation plan and assign it to the variablesplitPlan
.- You can set the last two arguments of the function to
NULL
.
- You can set the last two arguments of the function to
- Call
str()
to examine the structure ofsplitPlan
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Load the package vtreat
___
# mpg is available
summary(mpg)
# Get the number of rows in mpg
nRows <- ___
# Implement the 3-fold cross-fold plan with vtreat
splitPlan <- ___
# Examine the split plan
___