Creating a nicely pruned tree
Stopping a tree from growing all the way can lead it to ignore some aspects of the data or miss important trends it may have discovered later.
By using post-pruning, you can intentionally grow a large and complex tree then prune it to be smaller and more efficient later on.
In this exercise, you will have the opportunity to construct a visualization of the tree's performance versus complexity, and use this information to prune the tree to an appropriate level.
The rpart
package has been pre-loaded, along with loans_test
and loans_train
.
This exercise is part of the course
Supervised Learning in R: Classification
Exercise instructions
- Use all of the applicant variables and no pre-pruning to create an overly complex tree. Make sure to set
cp = 0
inrpart.control()
to prevent pre-pruning. - Create a complexity plot by using
plotcp()
on the model. - Based on the complexity plot, prune the tree to a complexity of 0.0014 using the
prune()
function with the tree and the complexity parameter. - Compare the accuracy of the pruned tree to the original accuracy of 58.3%. To calculate the accuracy use the
predict()
andmean()
functions.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Grow an overly complex tree
loan_model <- ___
# Examine the complexity plot
plotcp(___)
# Prune the tree
loan_model_pruned <- ___(___, cp = ___)
# Compute the accuracy of the pruned tree
loans_test$pred <- ___
mean(___)