1. Learn
  2. /
  3. Courses
  4. /
  5. Data Privacy and Anonymization in Python

Connected

Exercise

Removing names with faker

In this exercise, you will work with the 2018 NBA Salaries dataset. If this data weren't public, there would be a high risk of a re-identification attack. For example, since there is only one "Aaron Brooks" playing in the NBA, it could be possible to know other sensitive information like his exact salary per year. By removing personal names from the dataset, you can avoid potential damage to the people in it.

The .name() method will generate random names, including some female names. Besides doing this, you will also create names of one gender only.

The DataFrame has been loaded as nba.

Instructions 1/3

undefined XP
  • 1
    • Import the Faker class.
    • Initialize the faker generator as fake_data.
  • 2
    • Change the NBA player's names using the faker method .name(), with a lambda function applied to the player column.
  • 3
    • Generate male names only.