Pruning the tree with changed prior probabilities

In the video, you have learned that pruning a tree is necessary to avoid overfitting. There were some big trees in the previous exercises and now you will put what you have learned into practice, and prune the previously constructed tree with the changed prior probabilities. The rpart package is already loaded in your workspace.

You will first set a seed to make sure the results are reproducible as mentioned in the video, because you will be examining cross-validated error results. Results involve randomness and could differ slightly upon running the function again with a different seed.

In this exercise you will learn to identify which complexity parameter (CP) will minimize the cross-validated error results, then prune your tree based on this value.

This exercise is part of the course

Credit Risk Modeling in R

View Course

Exercise instructions

  • tree_prior is loaded in your workspace.
  • Use plotcp() to visualize cross-vaidated error (X-val Relative Error) in relation to the complexity parameter for tree_prior.
  • Use printcp() to print a table of information about CP, splits, and errors. See if you can identify which split has the minimum cross-validated error in tree_prior.
  • Use which.min() to identify which row in tree_prior$cptable has the minimum cross-validated error "xerror". Assign this to index.
  • Create tree_min by selecting the index of tree_prior$cptable within the column "CP".
  • Use the prune() function to obtain the pruned tree. Call the pruned tree ptree_prior.
  • Package rpart.plot is loaded in your workspace. Plot the pruned tree using function prp() (default setting).

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# tree_prior is loaded in your workspace

# Plot the cross-validated error rate as a function of the complexity parameter


# Use printcp() to identify for which complexity parameter the cross-validated error rate is minimized.


# Create an index for of the row with the minimum xerror
index <- which.min(___$___[ , "xerror"])

# Create tree_min
tree_min <- tree_prior$cptable[index, "CP"]

#  Prune the tree using tree_min
ptree_prior <- prune(___, cp = ___)

# Use prp() to plot the pruned tree