Develop and test the best model
In Chapter 3, you found out that the following parameters allow you to get better model:
max_depth = 8
,min_samples_leaf = 150
,class_weight = "balanced"
In this chapter, you discovered that some of the features have a negligible impact. You realized that you could get accurate predictions using just a small number of selected, impactful features and you updated your training and testing set accordingly, creating the variables features_train_selected
and features_test_selected
.
With all this information at your disposal, you're now going to develop the best model for predicting employee turnover and evaluate it using the appropriate metrics.
The features_train_selected
and features_test_selected
variables are available in your workspace, and the recall_score
and roc_auc_score
functions have been imported for you.
This exercise is part of the course
HR Analytics: Predicting Employee Churn in Python
Exercise instructions
- Initialize the best model using the parameters provided in the description.
- Fit the model using only the selected features from the training set.
- Make a prediction based on the selected features from the test set.
- Print the accuracy, recall and ROC/AUC scores of the model.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Initialize the best model using parameters provided in description
model_best = DecisionTreeClassifier(____=____, ____=____, ____=____, random_state=42)
# Fit the model using only selected features from training set: done
model_best.fit(____, target_train)
# Make prediction based on selected list of features from test set
prediction_best = model_best.____(____)
# Print the general accuracy of the model_best
print(____.score(features_test_selected, target_test) * 100)
# Print the recall score of the model predictions
print(____(target_test, prediction_best) * 100)
# Print the ROC/AUC score of the model predictions
print(roc_auc_score(target_test, ____) * 100)