Confusion matrices and accuracy of our final trees

Over the past few exercises, you have constructed quite a few pruned decision trees, with four in total. As you can see, the eventual number of splits varies quite a bit from one tree to another:

ptree_undersample  # 7 splits
ptree_prior  # 9 splits
ptree_loss_matrix  # 24 splits
ptree_weights  # 6 splits

Now it is important to know which tree performs best in terms of accuracy. In order to get the accuracy, you will start off by making predictions using the test set, and construct the confusion matrix for each of these trees. You will add the argument type = "class" when doing these predictions. By doing this there is no need to set a cut-off.

Nevertheless, it is important to be aware of the fact that not only the accuracy is important, but also the sensitivity and specificity. Additionally, predicting probabilities instead of binary values (0 or 1) has the advantage that the cut-off can be moved along. Then again, the difficulty here is the choice of the cut-off. You will return to this in the next chapter.

In case you needed a reminder, here is how to compute the accuracy: $$\textrm{Classification accuracy} = \frac{(TP + TN)}{(TP + FP + TN + FN)}$$

This exercise is part of the course

Credit Risk Modeling in R

View Course

Exercise instructions

  • Use predict() to make predictions for all four trees. The test_set should be included in the argument newdata. Don't forget to include type = "class"!
  • Construct confusion matrices for each of these decision trees. Use the function table(), and include the "true"" status (using test_set$loan_status) first, followed by the prediction.
  • Compute the accuracy using each of the confusion matrices.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Make predictions for each of the pruned trees using the test set.
pred_undersample <- predict(ptree_undersample, newdata = test_set,  type = "class")
pred_prior <-
pred_loss_matrix <-
pred_weights <-

# construct confusion matrices using the predictions.
confmat_undersample <- table(test_set$loan_status, pred_undersample)
confmat_prior <-
confmat_loss_matrix <-
confmat_weights <-

# Compute the accuracies
acc_undersample <- sum(diag(confmat_undersample)) / nrow(test_set)
acc_prior <-
acc_loss_matrix <-
acc_weights <-