CommencerCommencer gratuitement

Reducing identification risk with generalization

In this exercise, you will apply generalization on the IBM HR Analytics Employee Attrition & Performance dataset.

More specifically, you will transform the variable monthly_income to a binary column. The threshold to use for the transformation will be the mean value rounded up of the salaries. New values will be 0 for those that are less than or equal to the integer mean, and 1 for those greater.

The dataset is loaded as a pandas DataFrame hr.

Cet exercice fait partie du cours

Data Privacy and Anonymization in Python

Afficher le cours

Instructions

  • Calculate the mean value of the monthly_income column using .mean() and round it to an integer. Save it as mean_income.
  • Apply a lambda function to hr['monthly_income'] to generalize the incomes to be 0 for values less than or equal to the mean_income, and 1 for those that are greater.
  • Explore the first five rows of the resulting DataFrame hr.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Calculate the mean value of incomes
mean_income = ____

# Apply generalization by transforming to binary data
hr['monthly_income'] = ____

# See resulting DataFrame
print(____)
Modifier et exécuter le code