Building and evaluating a larger tree
Previously, you created a simple decision tree that used the applicant's credit score and requested loan amount to predict the loan outcome.
Lending Club has additional information about the applicants, such as home ownership status, length of employment, loan purpose, and past bankruptcies, that may be useful for making more accurate predictions.
Using all of the available applicant data, build a more sophisticated lending model using the random training dataset created previously. Then, use this model to make predictions on the testing dataset to estimate the performance of the model on future loan applications.
The rpart
package has been pre-loaded, and the loans_train
and loans_test
datasets have been created.
This exercise is part of the course
Supervised Learning in R: Classification
Exercise instructions
- Use
rpart()
to build a loan model using the training dataset and all of the available predictors. Again, leave thecontrol
argument alone. - Applying the
predict()
function to the testing dataset, create a vector of predicted outcomes. Don't forget thetype
argument. - Create a
table()
to compare the predicted values to the actualoutcome
values. - Compute the accuracy of the predictions using the
mean()
function.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Grow a tree using all of the available applicant data
loan_model <- rpart(___, data = ___, method = "___", control = rpart.control(cp = 0))
# Make predictions on the test dataset
loans_test$pred <- ___
# Examine the confusion matrix
table(___, ___)
# Compute the accuracy on the test dataset
mean(___)