Session Ready
Exercise

Pruning the tree

Overfitting is a classic problem in analytics, especially for the decision tree algorithm. Once the tree is fully grown, it may provide highly accurate predictions for the training sample, yet fail to be that accurate on the test set. For that reason, the growth of the decision tree is usually controlled by:

  • “Pruning” the tree and setting a limit on the maximum depth it can have.
  • Limiting the minimum number of observations in one leaf of the tree.

In this exercise, you will:

  • prune the tree and limit the growth of the tree to 5 levels of depth
  • fit it to the employee data
  • test prediction results on both training and testing sets.

The variables features_train, target_train, features_test and target_test are already available in your workspace.

Instructions
100 XP
  • Initialize the DecisionTreeClassifier while limiting the depth of the tree to 5.
  • Fit the Decision Tree model using the features and the target in the training set.
  • Check the accuracy of the predictions on both the training and test sets.