Pruning the tree with the loss matrix
In this exercise, you will prune the tree that was built using a loss matrix in order to penalize misclassified defaults more than misclassified non-defaults.
This exercise is part of the course
Credit Risk Modeling in R
Exercise instructions
- Run the code to set a seed and construct
tree_loss_matrix
again. - Use function plotcp() to examine the cross-validated error-structure.
- Looking at the cp-plot, you will notice that pruning the tree using the minimum cross-validated error will lead to a tree that is as big as the unpruned tree, as the cross-validated error reaches its minimum for
cp = 0.001
. Because you would like to make the tree somewhat smaller, try pruning the tree usingcp = 0.0012788
. For this complexity parameter, the cross-validated error approaches the minimum observed error. Call the pruned treeptree_loss_matrix
. - Package
rpart.plot
is loaded in your workspace. Plot the pruned tree using functionprp()
(including argumentextra = 1
).
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# set a seed and run the code to construct the tree with the loss matrix again
set.seed(345)
tree_loss_matrix <- rpart(loan_status ~ ., method = "class", data = training_set,
parms = list(loss=matrix(c(0, 10, 1, 0), ncol = 2)),
control = rpart.control(cp = 0.001))
# Plot the cross-validated error rate as a function of the complexity parameter
# Prune the tree using cp = 0.0012788
# Use prp() and argument extra = 1 to plot the pruned tree