Reducing identification risk with generalization
In this exercise, you will apply generalization on the IBM HR Analytics Employee Attrition & Performance dataset.
More specifically, you will transform the variable monthly_income to a binary column. The threshold to use for the transformation will be the mean value rounded up of the salaries. New values will be 0 for those that are less than or equal to the integer mean, and 1 for those greater.
The dataset is loaded as a pandas DataFrame hr.
Este ejercicio forma parte del curso
Data Privacy and Anonymization in Python
Instrucciones del ejercicio
- Calculate the mean value of the
monthly_incomecolumn using.mean()and round it to an integer. Save it asmean_income. - Apply a
lambdafunction tohr['monthly_income']to generalize the incomes to be 0 for values less than or equal to themean_income, and 1 for those that are greater. - Explore the first five rows of the resulting DataFrame
hr.
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
# Calculate the mean value of incomes
mean_income = ____
# Apply generalization by transforming to binary data
hr['monthly_income'] = ____
# See resulting DataFrame
print(____)