Get startedGet started for free

Using evolution variables

1. Using evolution variables

In the previous exercises, you learned how to create evolution variables. In this video, we make the link with the first predictive analytics course and show how these variables can be used in the predictive model.

2. Building predictive models

To construct a logistic regression model, you need to import the linear_model module from sklearn. Next, you can select the variables from the basetable using indexing. Note that we add an evolution variable here, just like we would add any other variable. Then, you can initialize a logistic regression model and fit the model on the given predictors and target.

3. Making predictions

With this logistic regression object, you can easily make predictions using the predict_proba function, that will return the probabilities of each object to be target according to the model.

4. Evaluating predictive models using AUC

To know whether the predictive model with the evolution variable performs well, you can calculate the AUC of the model. To do so, you first need to import the roc_auc_score module from sklearn. Then, you can use the roc_auc_score function to calculate the AUC of the model. The first argument of this function should be y, which contains the true target values, and predictions, which contains the predicted probabilities according to the model. The AUC returned is a value that assesses the quality of your model. A random model has AUC 0.5, while a perfect model has AUC 1.

5. The predictor insight graph

Besides achieving a good AUC value, it is also important to verify whether the variables that are in the model have a logical relationship with the target. This can be done by constructing the predictor insight graph, that shows the relationship between a predictor and the target. In the first predictive analytics course, you learned to implement two functions to create this graph. The `create_pig_table` function returns a table that has all the information needed to construct the predictor insight graph. The arguments of this function are the basetable, the target and the name of the predictor. The `plot_pig` function takes the resulting predictor insight graph table and the predictor name as argument, and returns the predictor insight graph. Note that the predictor insight graph can only be constructed for nominal variables. Therefore, if the variable is continuous, it should first be discretized using the `qcut` function that divides the predictive variable in chunks of equal size.

6. Predictor insight graph interpretation

This predictor insight graph shows the relationship between the evolution variable that divides the number of donations last month by the number of donations in the last year. The predictor values are divided in five bins, the grey bars show the size of each bin. The blue line indicates the target incidence of each bin. It is clear that the larger the ratio, the more likely the donor is to donate. This makes sense: donors that have increased donations over the last month, are more likely to donate again.

7. Let's practice!

Time for you to check the performance of the evolution variables you constructed in the exercises!