Exercise

Partitioning

In order to properly evaluate a model, one can partition the data in a train and test set. The train set contains the data the model is built on, and the test data is used to evaluate the model. This division is done randomly, but when the target incidence is low, it could be necessary to stratify, that is, to make sure that the train and test data contain an equal percentage of targets.

In this exercise you will partition the data with stratification and verify that the train and test data have equal target incidence. The train_test_split method has already been imported, and the X and y DataFrames are available in your workspace.

Instructions

100 XP
  • Stratify these DataFrames using the train_test_split method. Make sure that train and test set are the same size, and have equal target incidence.
  • Calculate the target incidence of the train set. This is the number of targets in the train set divided by the number of observations in the train set.
  • Calculate the target incidence of the test set.