1. 학습
  2. /
  3. 강의
  4. /
  5. Data Privacy and Anonymization in Python

Connected

연습 문제

Consistent synthetic dataset

One scenario in which companies use synthetic data is the training of artificial intelligence and machine learning models. Real-world data is sometimes expensive to collect, or simply hard to come by. When the training data is highly imbalanced (e.g., more than 90% of instances belong to one class), synthetic data generation can help build accurate machine learning models.

In this exercise, you will generate a mobile app rating dataset using Faker.

The initial DataFrame is loaded as ratings with two columns: rating and gender. A Faker() generator has already been initialized for you as fake_data.

지침 1/3

undefined XP
    1
    2
    3
  • Create a name column in the ratings DataFrame containing unique names corresponding to the gender column.