IniziaInizia gratis

Creating training and test datasets

Splitting a dataset into training and test sets is an important step in building and testing a classification model. The training set is used to build the model and the test set to evaluate its predictive accuracy.

In this exercise, you will split the dataset you created in the previous chapter into training and test sets. The dataset has been loaded in the data frame df and a seed has already been set to ensure reproducibility. Recall that in the previous video, we set the upper bound for the length of the training set with some handy functions - now it's your turn to implement them!

Questo esercizio fa parte del corso

Support Vector Machines in R

Visualizza il corso

Istruzioni dell'esercizio

  • Determine the upper bound for the number of rows to be in the training set and store it in sample_size.
  • Create the vector train which stores the randomly assigned training set according to the 80/20 proportion.
  • Assign the rows in train vector to the data frame trainset and the rest to the data frame testset.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Set the upper bound for the length of the training set
sample_size <- ___(___ * nrow(df))

# Assign rows to training set randomly
train <- ___(seq_len(nrow(df)), size = ___)

# Yield training and test sets
trainset <- df[___, ]
testset <- df[-___, ]
Modifica ed esegui il codice