Session Ready
Exercise

The test-train split

In a disciplined machine learning workflow it is crucial to withhold a portion of your data (testing data) from any decision-making process. This allows you to independently assess the performance of your model when it is finalized. The remaining data, the training data, is used to build and select the best model.

In this exercise, you will use the rsample package to split your data to perform the initial train-test split of your gapminder data.

Note: Since this is a random split of the data it is good practice to set a seed before splitting it.

Instructions
100 XP
  • Split your data into 75% training and 25% testing using the initial_split() function and assign it to gap_split.
  • Extract the training dataframe from gap_split using the training() function.
  • Extract the testing dataframe from gap_split using the testing() function.
  • Ensure that the dimensions of your new dataframes are what you expected by using the dim() function on training_data and testing_data.