Building and evaluating a larger tree
Previously, you created a simple decision tree that used the applicant's credit score and requested loan amount to predict the loan outcome.
Lending Club has additional information about the applicants, such as home ownership status, length of employment, loan purpose, and past bankruptcies, that may be useful for making more accurate predictions.
Using all of the available applicant data, build a more sophisticated lending model using the random training dataset created previously. Then, use this model to make predictions on the testing dataset to estimate the performance of the model on future loan applications.
The rpart package has been pre-loaded, and the loans_train and loans_test datasets have been created.
Deze oefening maakt deel uit van de cursus
Supervised Learning in R: Classification
Oefeninstructies
- Use
rpart()to build a loan model using the training dataset and all of the available predictors. Again, leave thecontrolargument alone. - Applying the
predict()function to the testing dataset, create a vector of predicted outcomes. Don't forget thetypeargument. - Create a
table()to compare the predicted values to the actualoutcomevalues. - Compute the accuracy of the predictions using the
mean()function.
Praktische interactieve oefening
Probeer deze oefening eens door deze voorbeeldcode in te vullen.
# Grow a tree using all of the available applicant data
loan_model <- rpart(___, data = ___, method = "___", control = rpart.control(cp = 0))
# Make predictions on the test dataset
loans_test$pred <- ___
# Examine the confusion matrix
table(___, ___)
# Compute the accuracy on the test dataset
mean(___)