Get startedGet started for free

Creating training and test datasets

Splitting a dataset into training and test sets is an important step in building and testing a classification model. The training set is used to build the model and the test set to evaluate its predictive accuracy.

In this exercise, you will split the dataset you created in the previous chapter into training and test sets. The dataset has been loaded in the data frame df and a seed has already been set to ensure reproducibility. Recall that in the previous video, we set the upper bound for the length of the training set with some handy functions - now it's your turn to implement them!

This exercise is part of the course

Support Vector Machines in R

View Course

Exercise instructions

  • Determine the upper bound for the number of rows to be in the training set and store it in sample_size.
  • Create the vector train which stores the randomly assigned training set according to the 80/20 proportion.
  • Assign the rows in train vector to the data frame trainset and the rest to the data frame testset.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Set the upper bound for the length of the training set
sample_size <- ___(___ * nrow(df))

# Assign rows to training set randomly
train <- ___(seq_len(nrow(df)), size = ___)

# Yield training and test sets
trainset <- df[___, ]
testset <- df[-___, ]
Edit and Run Code