Spliting employee data
Overfitting the dataset is a common problem in analytics. This happens when a model is working well on the dataset it was developed upon, but fails to generalize outside of it.
A train/test split is implemented to ensure model generalization: you develop the model using the training sample and try it out on the test sample later on.
In this exercise, you will split both target
and features
into train and test sets with 75%/25% ratio, respectively.
Este exercício faz parte do curso
HR Analytics: Predicting Employee Churn in Python
Instruções do exercício
- Import
train_test_split
from thesklearn.model_selection
module - Use
train_test_split()
to split your dataset into training and testing sets - Assign 25% of your observations to the testing set
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Import the function for splitting dataset into train and test
from sklearn.model_selection import ____
# Use that function to create the splits both for target and for features
# Set the test sample to be 25% of your observations
target_train, target_test, features_train, features_test = ____(target,features,____=0.25,random_state=42)