Try an 80/20 split
Now that your dataset is randomly ordered, you can split the first 80% of it into a training set, and the last 20% into a test set. You can do this by choosing a split point approximately 80% of the way through your data:
split <- round(nrow(mydata) * 0.80)
You can then use this point to break off the first 80% of the dataset as a training set:
mydata[1:split, ]
And then you can use that same point to determine the test set:
mydata[(split + 1):nrow(mydata), ]
This is a part of the course
“Machine Learning with caret in R”
Exercise instructions
- Choose a row index to split on so that the split point is approximately 80% of the way through the
diamonds
dataset. Call this indexsplit
. - Create a training set called
train
using that index. - Create a test set called
test
using that index.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Determine row to split on: split
# Create train
# Create test