Pruning the tree

Overfitting is a classic problem in analytics, especially for the decision tree algorithm. Once the tree is fully grown, it may provide highly accurate predictions for the training sample, yet fail to be that accurate on the test set. For that reason, the growth of the decision tree is usually controlled by:

  • “Pruning” the tree and setting a limit on the maximum depth it can have.
  • Limiting the minimum number of observations in one leaf of the tree.

In this exercise, you will:

  • prune the tree and limit the growth of the tree to 5 levels of depth
  • fit it to the employee data
  • test prediction results on both training and testing sets.

The variables features_train, target_train, features_test and target_test are already available in your workspace.

This exercise is part of the course

HR Analytics: Predicting Employee Churn in Python

View Course

Exercise instructions

  • Initialize the DecisionTreeClassifier while limiting the depth of the tree to 5.
  • Fit the Decision Tree model using the features and the target in the training set.
  • Check the accuracy of the predictions on both the training and test sets.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Initialize the DecisionTreeClassifier while limiting the depth of the tree to 5
model_depth_5 = DecisionTreeClassifier(____=5, random_state=42)

# Fit the model
____.fit(features_train,target_train)

# Print the accuracy of the prediction for the training set
print(____.____(features_train,target_train)*100)

# Print the accuracy of the prediction for the test set
print(model_depth_5.score(____,____)*100)