Split out the train and test sets
The first step of training a model is dividing the data into train and test sets. The tidymodels
package makes this easy. Setting aside a test data set allows you to evaluate the trained model on a set of data the model has never seen.
You will use the employee healthcare attrition data which contains data about employees of a healthcare company and whether they left the company or not. It is available in attrition_df
. The target variable is Attrition
.
The tidyverse
and tidymodels
packages have been loaded for you.
This exercise is part of the course
Dimensionality Reduction in R
Exercise instructions
- Initialize a split of the data with 80% for training and stratify based on
Attrition
, the target variable. - Extract the training data set and store it in
train
. - Extract the testing data set and store it in
test
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Initialize the split
split <- ___(___, ___ = ___, strata = ___)
# Extract training set
train <- ___ %>% ___()
# Extract testing set
test <- ___ %>% ___()