What is a decision tree?

1. What is a decision tree?

Another way of classifying customers into groups of defaults and non-defaults is with decision trees. One major reason for the popularity of decision trees is their interpretability.

2. Decision tree example

Here's an example tree. Answering the questions from the top (or the ROOT) of the tree through the bottom (or the leaf nodes), a decision is made on whether to classify a certain instance as a default or non-default case. There are several packages in R to construct decision trees automatically.

3. How to make splitting decision?

Before doing so, I'd like to explain how a so-called "splitting decision" is made. For each node of the tree, the splitting decision will affect the final structure of the tree. Let's look at two possible splitting decisions for the categorical variable. How do we decide whether the question being asked should be whether someone, let's say, rents a house or not, OR whether someone rents a house or belongs to the category "other", or not?

4. How to make splitting decision?

For a continuous variable (let's say age), how do we define the cutoff value for age that defines the boundary between a default or not? The answer can be found in defining a measure for impurity. In short, you would like to minimize impurity in each node of the tree. A popular measure (and the default measure for impurity in the popular rpart package) is the Gini measure. Let's look at an example.

5. Example

Assume you have a training set with 500 cases, 250 defaults, and 250 non-defaults. You will often see the number of cases in each node represented like this,

6. Example

with the actual number of non-defaults on the left-hand side,

7. Example

and the actual number of defaults on the right-hand side. Of course,

8. Example

the ideal, though unrealistic, scenario would lead to a perfect split between defaults and non-defaults at the age of 40. As this is not the case,

9. Example

let's compute the impurity using the Gini measure. The impurity in a certain node according to the Gini measure is given by two times the proportion of non-defaults in a node, times the proportion of defaults in this node. Applying this formula to the root node, this leads to an impurity of 0.5 (which is the maximum possible amount, with exactly as many defaults as non-defaults), an impurity of point-4664 in the left node, and an impurity of point-4536 in the right node. An important metric is the gain in purity that is achieved going from the root node to the two nodes N1 and N2.

10. Example

This gain can be computed by subtracting the weighted Gini-measures in the nodes N1 and N2 from the Gini measure in the root. In our example, this leads to a gain of almost 0.04. The algorithm selects the split that leads to the highest gain.

11. Let's practice!

Now let's practice!