Exercise

Creating training and test datasets

The rsample package is designed to create training and test datasets. Creating a test dataset is important for estimating how a trained model will likely perform on new data. It also guards against overfitting, where a model memorizes patterns that exist only in the training data and performs poorly on new data.

In this exercise, you will create training and test datasets from the home_sales data. This data contains information on homes sold in the Seattle, Washington area between 2015 and 2016.

The outcome variable in this data is selling_price.

The tidymodels package will be pre-loaded in every exercise in the course. The home_sales tibble has also been loaded for you.

Instructions 1/4

undefined XP
    1
    2
    3
    4
  • Create an rsample object, home_split, that contains the instructions for randomly splitting the home_sales data into a training and test dataset.
  • Allocate 70% of the data into training and stratify the results by selling_price.