Session Ready
Exercise

Spliting employee data

Overfitting the dataset is a common problem in analytics. This happens when a model is working well on the dataset it was developed upon, but fails to generalize outside of it.

A train/test split is implemented to ensure model generalization: you develop the model using the training sample and try it out on the test sample later on.

In this exercise, you will split both target and features into train and test sets with 75%/25% ratio, respectively.

Instructions
100 XP
  • Import train_test_split from the sklearn.model_selection module
  • Use train_test_split() to split your dataset into training and testing sets
  • Assign 25% of your observations to the testing set