Aan de slagGa gratis aan de slag

Creating training and test datasets

Splitting a dataset into training and test sets is an important step in building and testing a classification model. The training set is used to build the model and the test set to evaluate its predictive accuracy.

In this exercise, you will split the dataset you created in the previous chapter into training and test sets. The dataset has been loaded in the data frame df and a seed has already been set to ensure reproducibility. Recall that in the previous video, we set the upper bound for the length of the training set with some handy functions - now it's your turn to implement them!

Deze oefening maakt deel uit van de cursus

Support Vector Machines in R

Cursus bekijken

Oefeninstructies

  • Determine the upper bound for the number of rows to be in the training set and store it in sample_size.
  • Create the vector train which stores the randomly assigned training set according to the 80/20 proportion.
  • Assign the rows in train vector to the data frame trainset and the rest to the data frame testset.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Set the upper bound for the length of the training set
sample_size <- ___(___ * nrow(df))

# Assign rows to training set randomly
train <- ___(seq_len(nrow(df)), size = ___)

# Yield training and test sets
trainset <- df[___, ]
testset <- df[-___, ]
Code bewerken en uitvoeren