Exercise

# Computing the gain for a tree

In the video, you looked at how the Gini-measure is used to create the perfect split for a tree. Now, you will compute the gain for the tree loaded in your workspace.

The data set contains 500 cases, 89 of these cases are defaults. This led to a Gini of 0.292632 in the root node. As a small reminder, remember that Gini of a certain node = 2 * proportion of defaults in this node * proportion of non-defaults in this node. Have a look at the code for a refresher.

```
gini_root <- 2 * (89 / 500) * (411 / 500)
```

You will use these Gini measures to help you calculate the gain of the leaf nodes with respect to the root node. Look at the following code to get an idea of how you can use the gini measures you created to calculate the gain of a node.

```
Gain = gini_root - (prop(cases left leaf) * gini_left) - (prop(cases right leaf * gini_right))
```

Compute the gini in the left hand and the right hand node, and the gain of the two leaf nodes with respect to the root node. The object containing the tree is `small_tree`

.

Instructions

**100 XP**

- The computation for the Gini of the root node is given.
- Compute the Gini measure for the left leaf node.
- Compute the Gini measure for the right leaf node.
- Compute the gain by taking the difference between the root node Gini and the weighted leaf node Gini measures.
- Information regarding the split in this tree can be found using
`$split`

and the tree object,`small_tree`

. Instead of gain, you should look at the`improve`

column here.`improve`

is an alternative metric for gain, simply obtained by multiplying gain by the number of cases in the data set. Make sure that the object`improve`

(code given) has the same value as in`small_tree$split`

.