ComenzarEmpieza gratis

Datasets with the same probabilistic distribution

The goal of synthetic data is to create a dataset that is as realistic as possible, and does so without endangering important pieces of personal information. For instance, a team at Deloitte Consulting generated 80% of the training data for a machine learning model by synthesizing data. The resulting model accuracy was similar to a model trained on real data.

In this exercise, you will generate a synthetic dataset from scratch using Faker that follows a probabilistic distribution loaded as p.

The Faker generator fake_data has been already initialized and numpy is imported as np.

Este ejercicio forma parte del curso

Data Privacy and Anonymization in Python

Ver curso

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

# Obtain or specify the probabilities
p = (0.46, 0.26, 0.16, 0.1, 0.02)

# Generate 5 random cities 
cities = ____

# See the generated cities
print(cities)
Editar y ejecutar código