Exercise

One final tree using more options

In this exercise, you will use some final arguments that were discussed in the video. Some specifications in the rpart.control()-function will be changed, and some weights will be included using the weights argument in rpart(). The vector case_weights has been constructed for you and is loaded in your workspace. This vector contains weights of 1 for the non-defaults in the training set, and weights of 3 for defaults in the training sets. By specifying higher weights for default, the model will assign higher importance to classifying defaults correctly.

Instructions

100 XP
  • Set a seed of 345.
  • Add to the provided code by passing case_weights to the weights argument of `rpart().
  • Change the minimum number of splits that are allowed in a node to 5, and the minimum number of observations allowed in leaf nodes to 2 by using the arguments minsplit and minbucket in rpart.control respectively.
  • Use function plotcp() to investigate where the cross-validated error rate can be minimized.
  • Use which.min() to identify the row with the minimum "xerror" in tree_weights$cp. Assign this to index.
  • Use the provided code to select the cp for which the crossvalidated error is minimized
  • Prune the tree using the complexity parameter where the cross-validated error rate is minimized. Store the pruned tree in ptree_weights.
  • Plot the pruned tree using function prp(). Include a second argument extra and set it equal to 1.