Discretizing all variables
Instead of discretizing the continuous variables one by one, it is easier to discretize them automatically. To get a list of all the columns in Python, you can use
variables = basetable.columns
Only variables that are continuous should be discretized. You can verify whether variables should be discretized by checking whether they have more than a predefined number of different values.
This exercise is part of the course
Introduction to Predictive Analytics in Python
Exercise instructions
- Make a list
variables
containing all the column names of the basetable. - Create a loop that checks all the variables in the list
variables
. - Complete the
if
statement such that only variables with more than 5 different values are discretized. - Group the continuous variables in 10 bins using the
qcut
method.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Print the columns in the original basetable
print(basetable.columns)
# Get all the variable names except "target"
variables = list(____.____)
variables.remove("target")
# Loop through all the variables and discretize in 10 bins if there are more than 5 different values
for variable in ____:
if len(basetable.groupby(____))>____:
new_variable = "disc_" + variable
basetable[new_variable] = pd.qcut(basetable[____], ____)
# Print the columns in the new basetable
print(basetable.columns)