One final tree using more options
In this exercise, you will use some final arguments that were discussed in the video. Some specifications in the rpart.control()-function will be changed, and some weights will be included using the weights
argument in rpart()
. The vector case_weights
has been constructed for you and is loaded in your workspace. This vector contains weights of 1 for the non-defaults in the training set, and weights of 3 for defaults in the training sets. By specifying higher weights for default, the model will assign higher importance to classifying defaults correctly.
This exercise is part of the course
Credit Risk Modeling in R
Exercise instructions
- Set a seed of 345.
- Add to the provided code by passing
case_weights
to theweights
argument of `rpart(). - Change the minimum number of splits that are allowed in a node to 5, and the minimum number of observations allowed in leaf nodes to 2 by using the arguments
minsplit
andminbucket
inrpart.control
respectively. - Use function plotcp() to investigate where the cross-validated error rate can be minimized.
- Use
which.min()
to identify the row with the minimum"xerror"
intree_weights$cp
. Assign this toindex
. - Use the provided code to select the
cp
for which the crossvalidated error is minimized - Prune the tree using the complexity parameter where the cross-validated error rate is minimized. Store the pruned tree in
ptree_weights
. - Plot the pruned tree using function
prp()
. Include a second argumentextra
and set it equal to 1.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# set a seed and run the code to obtain a tree using weights, minsplit and minbucket
set.seed(345)
tree_weights <- rpart(loan_status ~ ., method = "class",
data = training_set,
control = rpart.control(minsplit = ___, minbucket = ___, cp = 0.001))
# Plot the cross-validated error rate for a changing cp
# Create an index for of the row with the minimum xerror
index <- which.min(___$___[ , "xerror"])
# Create tree_min
tree_min <- tree_weights$cp[index, "CP"]
# Prune the tree using tree_min
# Plot the pruned tree using the rpart.plot()-package