# Try an 80/20 split

Now that your dataset is randomly ordered, you can split the first 80% of it into a training set, and the last 20% into a test set. You can do this by choosing a split point approximately 80% of the way through your data:

```
split <- round(nrow(mydata) * 0.80)
```

You can then use this point to break off the first 80% of the dataset as a training set:

```
mydata[1:split, ]
```

And then you can use that same point to determine the test set:

```
mydata[(split + 1):nrow(mydata), ]
```

This is a part of the course

## “Machine Learning with caret in R”

### Exercise instructions

- Choose a row index to split on so that the split point is approximately 80% of the way through the
`diamonds`

dataset. Call this index`split`

. - Create a training set called
`train`

using that index. - Create a test set called
`test`

using that index.

### Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

```
# Determine row to split on: split
# Create train
# Create test
```

This exercise is part of the course

## Machine Learning with caret in R

This course teaches the big ideas in machine learning like how to build and evaluate predictive models.

In the first chapter of this course, you'll fit regression models with <code>train()</code> and evaluate their out-of-sample performance using cross-validation and root-mean-square error (RMSE).

Exercise 1: Welcome to the courseExercise 2: In-sample RMSE for linear regressionExercise 3: In-sample RMSE for linear regression on diamondsExercise 4: Out-of-sample error measuresExercise 5: Out-of-sample RMSE for linear regressionExercise 6: Randomly order the data frameExercise 7: Try an 80/20 splitExercise 8: Predict on test setExercise 9: Calculate test set RMSE by handExercise 10: Comparing out-of-sample RMSE to in-sample RMSEExercise 11: Cross-validationExercise 12: Advantage of cross-validationExercise 13: 10-fold cross-validationExercise 14: 5-fold cross-validationExercise 15: 5 x 5-fold cross-validationExercise 16: Making predictions on new data### What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.