Aan de slagGa gratis aan de slag

Limiting the sample size

Another method to prevent overfitting is to specify the minimum number of observations necessary to grow a leaf (or node), in the Decision Tree.

In this exercise, you will:

  • set this minimum limit to 100
  • fit the new model to the employee data
  • examine prediction results on both training and test sets

The variables features_train, target_train, features_test and target_test are already available in your workspace.

Deze oefening maakt deel uit van de cursus

HR Analytics: Predicting Employee Churn in Python

Cursus bekijken

Oefeninstructies

  • Initialize the DecisionTreeClassifier and set the leaf minimum limit to 100 observations
  • Fit the decision tree model to the training data.
  • Check the accuracy of the predictions on both the training and test sets.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Initialize the DecisionTreeClassifier while limiting the sample size in leaves to 100
model_sample_100 = DecisionTreeClassifier(____, random_state=42)

# Fit the model
____.fit(features_train,____)

# Print the accuracy of the prediction (in percentage points) for the training set
print(____.score(features_train,target_train)*100)

# Print the accuracy of the prediction (in percentage points) for the test set
print(____.____(features_test,target_test)*100)
Code bewerken en uitvoeren