Selecting the next best variable

The forward stepwise variable selection method starts with an empty variable set and proceeds in steps, where in each step the next best variable is added. To implement this procedure, two handy functions have been implemented for you.

The auc function calculates for a given variable set variables the AUC of the model that uses this variable set as predictors. The next_best function calculates which variable should be added in the next step to the variable list.

In this exercise, you will experiment with these functions to better understand their purpose. You will calculate the AUC of a given variable set, calculate which variable should be added next, and verify that this indeed results in an optimal AUC.

This exercise is part of the course

Introduction to Predictive Analytics in Python

View Course

Exercise instructions

  • The auc function has been implemented for you. Calculate the AUC of a model that uses "max_gift", "mean_gift" and "min_gift" as predictors. You should pass these variables in a list as the first argument to the auc function.
  • The next_best function has been implemented for you. Calculate which variable should be added next, given that "max_gift", "mean_gift" and "min_gift" are currently in the model, and "age" and "gender_F" are the candidate next predictors. The first argument of the next_best function is a list with the current variables, while the second argument is a list with the candidate predictors.
  • Calculate the AUC of a model that uses "max_gift", "mean_gift", "min_gift" and "age" as predictors.
  • Calculate the AUC of a model that uses "max_gift", "mean_gift", "min_gift" and "gender_F" as predictors.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Calculate the AUC of a model that uses "max_gift", "mean_gift" and "min_gift" as predictors
auc_current = ____([____, ____, ____], ["target"], basetable)
print(round(auc_current,4))

# Calculate which variable among "age" and "gender_F" should be added to the variables "max_gift", "mean_gift" and "min_gift"
next_variable = ____([____, ____, ____], [____, ____], ["target"], basetable)
print(next_variable)

# Calculate the AUC of a model that uses "max_gift", "mean_gift", "min_gift" and "age" as predictors
auc_current_age = ____([____, ____, ____, ____], ["target"], basetable)
print(round(auc_current_age,4))

# Calculate the AUC of a model that uses "max_gift", "mean_gift", "min_gift" and "gender_F" as predictors
auc_current_gender_F = ____([____], ["target"], basetable)
print(round(auc_current_gender_F,4))