Creating training and test datasets
Splitting a dataset into training and test sets is an important step in building and testing a classification model. The training set is used to build the model and the test set to evaluate its predictive accuracy.
In this exercise, you will split the dataset you created in the previous chapter into training and test sets. The dataset has been loaded in the data frame df
and a seed has already been set to ensure reproducibility. Recall that in the previous video, we set the upper bound for the length of the training set with some handy functions - now it's your turn to implement them!
Diese Übung ist Teil des Kurses
Support Vector Machines in R
Anleitung zur Übung
- Determine the upper bound for the number of rows to be in the training set and store it in
sample_size
. - Create the vector
train
which stores the randomly assigned training set according to the 80/20 proportion. - Assign the rows in
train
vector to the data frametrainset
and the rest to the data frametestset
.
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Set the upper bound for the length of the training set
sample_size <- ___(___ * nrow(df))
# Assign rows to training set randomly
train <- ___(seq_len(nrow(df)), size = ___)
# Yield training and test sets
trainset <- df[___, ]
testset <- df[-___, ]