Creating training and test datasets
The rsample
package is designed to create training and test datasets. Creating a test dataset is important for estimating how a trained model will likely perform on new data. It also guards against overfitting, where a model memorizes patterns that exist only in the training data and performs poorly on new data.
In this exercise, you will create training and test datasets from the home_sales
data. This data contains information on homes sold in the Seattle, Washington area between 2015 and 2016.
The outcome variable in this data is selling_price
.
The tidymodels
package will be pre-loaded in every exercise in the course. The home_sales
tibble has also been loaded for you.
This exercise is part of the course
Modeling with tidymodels in R
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a data split object
home_split <- ___(home_sales,
prop = ___,
strata = ___)