Aan de slagGa gratis aan de slag

Datasets with the same probabilistic distribution

The goal of synthetic data is to create a dataset that is as realistic as possible, and does so without endangering important pieces of personal information. For instance, a team at Deloitte Consulting generated 80% of the training data for a machine learning model by synthesizing data. The resulting model accuracy was similar to a model trained on real data.

In this exercise, you will generate a synthetic dataset from scratch using Faker that follows a probabilistic distribution loaded as p.

The Faker generator fake_data has been already initialized and numpy is imported as np.

Deze oefening maakt deel uit van de cursus

Data Privacy and Anonymization in Python

Cursus bekijken

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Obtain or specify the probabilities
p = (0.46, 0.26, 0.16, 0.1, 0.02)

# Generate 5 random cities 
cities = ____

# See the generated cities
print(cities)
Code bewerken en uitvoeren