Exercise

# Correlated variables

The first 10 variables that are added to the model are the following:

```
['max_gift', 'number_gift', 'time_since_last_gift', 'mean_gift', 'income_high', 'age', 'country_USA', 'gender_F', 'income_low', 'country_UK']
```

As you can see, `min_gift`

is not added. Does this mean that it is a bad variable? You can test the performance of the variable by using it in a model as a single variable and calculating the AUC. How does the AUC of `min_gift`

compare to the AUC of `income_high`

? To this end, you can use the function `auc()`

:

```
auc(variables, target, basetable)
```

It can happen that a good variable is not added because it is highly correlated with a variable that is already in the model. You can test this calculating the correlation between these variables:

```
import numpy
numpy.corrcoef(basetable["variable_1"],basetable["variable_2"])[0,1]
```

Instructions

**100 XP**

- Calculate the AUC of the model using the variable
`min_gift`

only. - Calculate the AUC of the model using the variable
`income_high`

only. - Calculate the correlation between the variable
`min_gift`

and`mean_gift`

.