ComenzarEmpieza gratis

Undersampling the training set

In the video, you saw that to overcome the unbalanced data problem, you can use under- or oversampling. The training set has been undersampled for you, such that 1/3 of the training set consists of defaults, and 2/3 of non-defaults. The resulting data set is available in your workspace and named undersampled_training_set, and contains less observations (6570 instead of 19394). In this exercise, you will create a decision tree using the undersampled data set.

You will notice that the trees in this and the next exercises are very big, so big that you cannot really read them anymore. Don't worry about this for now, we will tell you how you can make them more manageable in the next video!

Este ejercicio forma parte del curso

Credit Risk Modeling in R

Ver curso

Instrucciones del ejercicio

  • The rpart package has been installed for you. Load the package in your workspace.
  • Change the code provided such that a decision tree is constructed using the undersampled training set instead of training_set. Additionally, add the argument control = rpart.control(cp = 0.001). cp, which is the complexity parameter, is the threshold value for a decrease in overall lack of fit for any split. If cp is not met, further splits will no longer be pursued. cp's default value is 0.01, but for complex problems, it is advised to relax cp.
  • Plot the decision tree using the function plot and the tree object name. Add a second argument uniform = TRUE to get equal-sized branches.
  • The previous command simply creates a tree with some nodes and edges, but without any text (or so-called "labels") on it. Use function text() with sole argument tree_undersample to add labels.

Ejercicio interactivo práctico

Prueba este ejercicio completando el código de muestra.

# Load package rpart in your workspace.


# Change the code provided in the video such that a decision tree is constructed using the undersampled training set. Include rpart.control to relax the complexity parameter to 0.001.
tree_undersample <- rpart(loan_status ~ ., method = "class",
                          data =  training_set)

# Plot the decision tree


# Add labels to the decision tree
Editar y ejecutar código