Try a 60/40 split
As you saw in the video, you'll be working with the Sonar
dataset in this chapter, using a 60% training set and a 40% test set. We'll practice making a train/test split one more time, just to be sure you have the hang of it. Recall that you can use the sample()
function to get a random permutation of the row indices in a dataset, to use when making train/test splits, e.g.:
n_obs <- nrow(my_data)
permuted_rows <- sample(n_obs)
And then use those row indices to randomly reorder the dataset, e.g.:
my_data <- my_data[permuted_rows, ]
Once your dataset is randomly ordered, you can split off the first 60% as a training set and the last 40% as a test set.
This exercise is part of the course
Machine Learning with caret in R
Exercise instructions
- Get the number of observations (rows) in
Sonar
, assigning ton_obs
. - Shuffle the row indices of
Sonar
and store the result inpermuted_rows
. - Use
permuted_rows
to randomly reorder the rows ofSonar
, saving asSonar_shuffled
. - Identify the proper row to split on for a 60/40 split. Store this row number as
split
. - Save the first 60% of
Sonar_shuffled
as a training set. - Save the last 40% of
Sonar_shuffled
as the test set.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Get the number of observations
# Shuffle row indices: permuted_rows
# Randomly order data: Sonar
# Identify row to split on: split
split <- round(n_obs * ___)
# Create train
# Create test