LoslegenKostenlos loslegen

Consistent synthetic dataset

One scenario in which companies use synthetic data is the training of artificial intelligence and machine learning models. Real-world data is sometimes expensive to collect, or simply hard to come by. When the training data is highly imbalanced (e.g., more than 90% of instances belong to one class), synthetic data generation can help build accurate machine learning models.

In this exercise, you will generate a mobile app rating dataset using Faker.

The initial DataFrame is loaded as ratings with two columns: rating and gender. A Faker() generator has already been initialized for you as fake_data.

Diese Übung ist Teil des Kurses

Data Privacy and Anonymization in Python

Kurs anzeigen

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Generate a name according to the gender that will be unique in the dataset
ratings['name'] = [____ if x == "Female" 
                   else ____
                   for x in ratings['gender']] 
Code bearbeiten und ausführen