Develop and test the best model

In Chapter 3, you found out that the following parameters allow you to get better model:

max_depth = 8,
min_samples_leaf = 150,
class_weight = "balanced"

In this chapter, you discovered that some of the features have a negligible impact. You realized that you could get accurate predictions using just a small number of selected, impactful features and you updated your training and testing set accordingly, creating the variables features_train_selected and features_test_selected.

With all this information at your disposal, you're now going to develop the best model for predicting employee turnover and evaluate it using the appropriate metrics.

The features_train_selected and features_test_selected variables are available in your workspace, and the recall_score and roc_auc_score functions have been imported for you.

This exercise is part of the course

HR Analytics: Predicting Employee Churn in Python

View Course

Exercise instructions

Initialize the best model using the parameters provided in the description.
Fit the model using only the selected features from the training set.
Make a prediction based on the selected features from the test set.
Print the accuracy, recall and ROC/AUC scores of the model.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Initialize the best model using parameters provided in description
model_best = DecisionTreeClassifier(____=____, ____=____, ____=____, random_state=42)

# Fit the model using only selected features from training set: done
model_best.fit(____, target_train)

# Make prediction based on selected list of features from test set
prediction_best = model_best.____(____)

# Print the general accuracy of the model_best
print(____.score(features_test_selected, target_test) * 100)

# Print the recall score of the model predictions
print(____(target_test, prediction_best) * 100)

# Print the ROC/AUC score of the model predictions
print(roc_auc_score(target_test, ____) * 100)

Edit and Run Code