Limiting the sample size
Another method to prevent overfitting is to specify the minimum number of observations necessary to grow a leaf (or node), in the Decision Tree.
In this exercise, you will:
- set this minimum limit to 100
- fit the new model to the employee data
- examine prediction results on both training and test sets
The variables features_train
, target_train
, features_test
and target_test
are already available in your workspace.
This exercise is part of the course
HR Analytics: Predicting Employee Churn in Python
Exercise instructions
- Initialize the
DecisionTreeClassifier
and set the leaf minimum limit to 100 observations - Fit the decision tree model to the training data.
- Check the accuracy of the predictions on both the training and test sets.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Initialize the DecisionTreeClassifier while limiting the sample size in leaves to 100
model_sample_100 = DecisionTreeClassifier(____, random_state=42)
# Fit the model
____.fit(features_train,____)
# Print the accuracy of the prediction (in percentage points) for the training set
print(____.score(features_train,target_train)*100)
# Print the accuracy of the prediction (in percentage points) for the test set
print(____.____(features_test,target_test)*100)