Pruning the tree with changed prior probabilities
In the video, you have learned that pruning a tree is necessary to avoid overfitting. There were some big trees in the previous exercises and now you will put what you have learned into practice, and prune the previously constructed tree with the changed prior probabilities. The rpart
package is already loaded in your workspace.
You will first set a seed to make sure the results are reproducible as mentioned in the video, because you will be examining cross-validated error results. Results involve randomness and could differ slightly upon running the function again with a different seed.
In this exercise you will learn to identify which complexity parameter (CP) will minimize the cross-validated error results, then prune your tree based on this value.
This exercise is part of the course
Credit Risk Modeling in R
Exercise instructions
tree_prior
is loaded in your workspace.- Use
plotcp()
to visualize cross-vaidated error (X-val Relative Error) in relation to the complexity parameter fortree_prior
. - Use
printcp()
to print a table of information about CP, splits, and errors. See if you can identify which split has the minimum cross-validated error intree_prior
. - Use
which.min()
to identify which row intree_prior$cptable
has the minimum cross-validated error"xerror"
. Assign this toindex
. - Create
tree_min
by selecting the index oftree_prior$cptable
within the column"CP"
. - Use the
prune()
function to obtain the pruned tree. Call the pruned treeptree_prior
. - Package
rpart.plot
is loaded in your workspace. Plot the pruned tree using function prp() (default setting).
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# tree_prior is loaded in your workspace
# Plot the cross-validated error rate as a function of the complexity parameter
# Use printcp() to identify for which complexity parameter the cross-validated error rate is minimized.
# Create an index for of the row with the minimum xerror
index <- which.min(___$___[ , "xerror"])
# Create tree_min
tree_min <- tree_prior$cptable[index, "CP"]
# Prune the tree using tree_min
ptree_prior <- prune(___, cp = ___)
# Use prp() to plot the pruned tree