Pruning the tree

Overfitting is a classic problem in analytics, especially for the decision tree algorithm. Once the tree is fully grown, it may provide highly accurate predictions for the training sample, yet fail to be that accurate on the test set. For that reason, the growth of the decision tree is usually controlled by:

“Pruning” the tree and setting a limit on the maximum depth it can have.
Limiting the minimum number of observations in one leaf of the tree.

In this exercise, you will:

prune the tree and limit the growth of the tree to 5 levels of depth
fit it to the employee data
test prediction results on both training and testing sets.

The variables features_train, target_train, features_test and target_test are already available in your workspace.

Este exercício faz parte do curso

HR Analytics: Predicting Employee Churn in Python

Ver curso

Instruções do exercício

Initialize the DecisionTreeClassifier while limiting the depth of the tree to 5.
Fit the Decision Tree model using the features and the target in the training set.
Check the accuracy of the predictions on both the training and test sets.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Initialize the DecisionTreeClassifier while limiting the depth of the tree to 5
model_depth_5 = DecisionTreeClassifier(____=5, random_state=42)

# Fit the model
____.fit(features_train,target_train)

# Print the accuracy of the prediction for the training set
print(____.____(features_train,target_train)*100)

# Print the accuracy of the prediction for the test set
print(model_depth_5.score(____,____)*100)

Editar e executar o código