1. Learn
  2. /
  3. Courses
  4. /
  5. Data Privacy and Anonymization in Python

Connected

Exercise

Masking sensitive PII

You have been given a dataset containing Social Security numbers (SSN) of American citizens along with their city locations and age. If this was unreleased data only consented to be shared with us, when exposing it, a privacy breach would occur since you are disclosing data the subjects didn't expect us to share.

Your job will be to anonymize the data by applying partial masking to the Sensitive PII ssn. Remember, data masking is about hiding/obfuscating data to avoid data privacy breaches, while preserving the overall format and semantics.

The dataset has been loaded as insurance_df, but save the resulting data in masked_df to keep the original insurance_df intact.

Instructions 1/2

undefined XP
  • 1
    • Mask the ssn column of masked_df with '*'.
    • See the first 5 rows of the resulting DataFrame using .head().
  • 2
    • Apply partial masking to ssn with a lambda function in which for every number s, it concatenates the first character with "****" and the last character (e.g. "1****6").