One final tree using more options

In this exercise, you will use some final arguments that were discussed in the video. Some specifications in the rpart.control()-function will be changed, and some weights will be included using the weights argument in rpart(). The vector case_weights has been constructed for you and is loaded in your workspace. This vector contains weights of 1 for the non-defaults in the training set, and weights of 3 for defaults in the training sets. By specifying higher weights for default, the model will assign higher importance to classifying defaults correctly.

This exercise is part of the course

Credit Risk Modeling in R

View Course

Exercise instructions

  • Set a seed of 345.
  • Add to the provided code by passing case_weights to the weights argument of `rpart().
  • Change the minimum number of splits that are allowed in a node to 5, and the minimum number of observations allowed in leaf nodes to 2 by using the arguments minsplit and minbucket in rpart.control respectively.
  • Use function plotcp() to investigate where the cross-validated error rate can be minimized.
  • Use which.min() to identify the row with the minimum "xerror" in tree_weights$cp. Assign this to index.
  • Use the provided code to select the cp for which the crossvalidated error is minimized
  • Prune the tree using the complexity parameter where the cross-validated error rate is minimized. Store the pruned tree in ptree_weights.
  • Plot the pruned tree using function prp(). Include a second argument extra and set it equal to 1.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# set a seed and run the code to obtain a tree using weights, minsplit and minbucket
set.seed(345)
tree_weights <- rpart(loan_status ~ ., method = "class",
                      data = training_set,
                      control = rpart.control(minsplit = ___, minbucket = ___, cp = 0.001))

# Plot the cross-validated error rate for a changing cp


# Create an index for of the row with the minimum xerror
index <- which.min(___$___[ , "xerror"])

# Create tree_min
tree_min <- tree_weights$cp[index, "CP"]

# Prune the tree using tree_min


# Plot the pruned tree using the rpart.plot()-package