Creating training and test datasets
Splitting a dataset into training and test sets is an important step in building and testing a classification model. The training set is used to build the model and the test set to evaluate its predictive accuracy.
In this exercise, you will split the dataset you created in the previous chapter into training and test sets. The dataset has been loaded in the data frame df
and a seed has already been set to ensure reproducibility. Recall that in the previous video, we set the upper bound for the length of the training set with some handy functions - now it's your turn to implement them!
This exercise is part of the course
Support Vector Machines in R
Exercise instructions
- Determine the upper bound for the number of rows to be in the training set and store it in
sample_size
. - Create the vector
train
which stores the randomly assigned training set according to the 80/20 proportion. - Assign the rows in
train
vector to the data frametrainset
and the rest to the data frametestset
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Set the upper bound for the length of the training set
sample_size <- ___(___ * nrow(df))
# Assign rows to training set randomly
train <- ___(seq_len(nrow(df)), size = ___)
# Yield training and test sets
trainset <- df[___, ]
testset <- df[-___, ]