Limiting the sample size

Another method to prevent overfitting is to specify the minimum number of observations necessary to grow a leaf (or node), in the Decision Tree.

In this exercise, you will:

set this minimum limit to 100
fit the new model to the employee data
examine prediction results on both training and test sets

The variables features_train, target_train, features_test and target_test are already available in your workspace.

This exercise is part of the course

HR Analytics: Predicting Employee Churn in Python

View Course

Exercise instructions

Initialize the DecisionTreeClassifier and set the leaf minimum limit to 100 observations
Fit the decision tree model to the training data.
Check the accuracy of the predictions on both the training and test sets.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Initialize the DecisionTreeClassifier while limiting the sample size in leaves to 100
model_sample_100 = DecisionTreeClassifier(____, random_state=42)

# Fit the model
____.fit(features_train,____)

# Print the accuracy of the prediction (in percentage points) for the training set
print(____.score(features_train,target_train)*100)

# Print the accuracy of the prediction (in percentage points) for the test set
print(____.____(features_test,target_test)*100)

Edit and Run Code