Session Ready
Exercise

Train/test split

To objectively assess a Machine Learning model you need to be able to test it on an independent set of data. You can't use the same data that you used to train the model: of course the model will perform (relatively) well on those data!

You will split the data into two components:

  • training data (used to train the model) and
  • testing data (used to test the model).
Instructions
100 XP
  • Randomly split the flights data into two sets with 80:20 proportions. For repeatability set a random number seed of 17 for the split.
  • Check that the training data has roughly 80% of the records from the original data.