Session Ready
Exercise

Creating training and test datasets

Splitting a dataset into training and test sets is an important step in building and testing a classification model. The training set is used to build the model and the test set to evaluate its predictive accuracy.

In this exercise, you will split the dataset you created in the previous chapter into training and test sets. The dataset has been loaded in the dataframe df and a seed has already been set to ensure reproducibility.

Instructions
100 XP
  • Create a column called train in df and randomly assign 80% of the rows in df a value of 1 for this column (and the remaining rows a value of 0).
  • Assign the rows with train == 1 to the dataframe trainset and those with train == 0 to the dataframe testset.
  • Remove train column from training and test datasets by index.