Pruning the tree
Overfitting is a classic problem in analytics, especially for the decision tree algorithm. Once the tree is fully grown, it may provide highly accurate predictions for the training sample, yet fail to be that accurate on the test set. For that reason, the growth of the decision tree is usually controlled by:
- “Pruning” the tree and setting a limit on the maximum depth it can have.
- Limiting the minimum number of observations in one leaf of the tree.
In this exercise, you will:
- prune the tree and limit the growth of the tree to 5 levels of depth
- fit it to the employee data
- test prediction results on both training and testing sets.
The variables features_train
, target_train
, features_test
and target_test
are already available in your workspace.
This exercise is part of the course
HR Analytics: Predicting Employee Churn in Python
Exercise instructions
- Initialize the
DecisionTreeClassifier
while limiting the depth of the tree to 5. - Fit the Decision Tree model using the
features
and thetarget
in the training set. - Check the accuracy of the predictions on both the training and test sets.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Initialize the DecisionTreeClassifier while limiting the depth of the tree to 5
model_depth_5 = DecisionTreeClassifier(____=5, random_state=42)
# Fit the model
____.fit(features_train,target_train)
# Print the accuracy of the prediction for the training set
print(____.____(features_train,target_train)*100)
# Print the accuracy of the prediction for the test set
print(model_depth_5.score(____,____)*100)