Finding the order of variables
The forward stepwise variable selection procedure starts with an empty set of variables, and adds predictors one by one. In each step, the predictor that has the highest AUC in combination with the current variables is selected.
In this exercise you will learn to implement the forward stepwise variable selection procedure. To this end, you can use the next_best
function that has been implemented for you. It can be used as follows:
next_best(current_variables,candidate_variables,target,basetable)
where current_variables
is the list of variables that is already in the model and candidate_variables
the list of variables that can be added next.
This exercise is part of the course
Introduction to Predictive Analytics in Python
Exercise instructions
- Use the function
next_best
to calculate the next best variable and assign it tonext_variable
. - Update the
current_variables
list. - Update the
candidate_variables
list.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Find the candidate variables
candidate_variables = list(basetable.columns.values)
candidate_variables.remove("target")
# Initialize the current variables
current_variables = []
# The forward stepwise variable selection procedure
number_iterations = 5
for i in range(0, number_iterations):
next_variable = ____(____, ____, ["target"], basetable)
current_variables = current_variables + [____]
candidate_variables.remove(____)
print("Variable added in step " + str(i+1) + " is " + next_variable + ".")
print(current_variables)