Get startedGet started for free

Try a 60/40 split

As you saw in the video, you'll be working with the Sonar dataset in this chapter, using a 60% training set and a 40% test set. We'll practice making a train/test split one more time, just to be sure you have the hang of it. Recall that you can use the sample() function to get a random permutation of the row indices in a dataset, to use when making train/test splits, e.g.:

n_obs <- nrow(my_data)
permuted_rows <- sample(n_obs)

And then use those row indices to randomly reorder the dataset, e.g.:

my_data <- my_data[permuted_rows, ]

Once your dataset is randomly ordered, you can split off the first 60% as a training set and the last 40% as a test set.

This exercise is part of the course

Machine Learning with caret in R

View Course

Exercise instructions

  • Get the number of observations (rows) in Sonar, assigning to n_obs.
  • Shuffle the row indices of Sonar and store the result in permuted_rows.
  • Use permuted_rows to randomly reorder the rows of Sonar, saving as Sonar_shuffled.
  • Identify the proper row to split on for a 60/40 split. Store this row number as split.
  • Save the first 60% of Sonar_shuffled as a training set.
  • Save the last 40% of Sonar_shuffled as the test set.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Get the number of observations


# Shuffle row indices: permuted_rows


# Randomly order data: Sonar


# Identify row to split on: split
split <- round(n_obs * ___)

# Create train


# Create test
Edit and Run Code