Limiting the sample size

Another method to prevent overfitting is to specify the minimum number of observations necessary to grow a leaf (or node), in the Decision Tree.

In this exercise, you will:

set this minimum limit to 100
fit the new model to the employee data
examine prediction results on both training and test sets

The variables features_train, target_train, features_test and target_test are already available in your workspace.

Deze oefening maakt deel uit van de cursus

HR Analytics: Predicting Employee Churn in Python

Cursus bekijken

Oefeninstructies

Initialize the DecisionTreeClassifier and set the leaf minimum limit to 100 observations
Fit the decision tree model to the training data.
Check the accuracy of the predictions on both the training and test sets.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Initialize the DecisionTreeClassifier while limiting the sample size in leaves to 100
model_sample_100 = DecisionTreeClassifier(____, random_state=42)

# Fit the model
____.fit(features_train,____)

# Print the accuracy of the prediction (in percentage points) for the training set
print(____.score(features_train,target_train)*100)

# Print the accuracy of the prediction (in percentage points) for the test set
print(____.____(features_test,target_test)*100)

Code bewerken en uitvoeren