Exercise

Reducing identification risk with generalization

In this exercise, you will apply generalization on the IBM HR Analytics Employee Attrition & Performance dataset.

More specifically, you will transform the variable monthly_income to a binary column. The threshold to use for the transformation will be the mean value rounded up of the salaries. New values will be 0 for those that are less than or equal to the integer mean, and 1 for those greater.

The dataset is loaded as a pandas DataFrame hr.

Instructions

100 XP
  • Calculate the mean value of the monthly_income column using .mean() and round it to an integer. Save it as mean_income.
  • Apply a lambda function to hr['monthly_income'] to generalize the incomes to be 0 for values less than or equal to the mean_income, and 1 for those that are greater.
  • Explore the first five rows of the resulting DataFrame hr.