Discretizing all variables
Instead of discretizing the continuous variables one by one, it is easier to discretize them automatically. To get a list of all the columns in Python, you can use
variables = basetable.columns
Only variables that are continuous should be discretized. You can verify whether variables should be discretized by checking whether they have more than a predefined number of different values.
Este ejercicio forma parte del curso
Introduction to Predictive Analytics in Python
Instrucciones del ejercicio
- Make a list
variablescontaining all the column names of the basetable. - Create a loop that checks all the variables in the list
variables. - Complete the
ifstatement such that only variables with more than 5 different values are discretized. - Group the continuous variables in 10 bins using the
qcutmethod.
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
# Print the columns in the original basetable
print(basetable.columns)
# Get all the variable names except "target"
variables = list(____.____)
variables.remove("target")
# Loop through all the variables and discretize in 10 bins if there are more than 5 different values
for variable in ____:
if len(basetable.groupby(____))>____:
new_variable = "disc_" + variable
basetable[new_variable] = pd.qcut(basetable[____], ____)
# Print the columns in the new basetable
print(basetable.columns)