Exercise

Datasets with the same probabilistic distribution

The goal of synthetic data is to create a dataset that is as realistic as possible, and does so without endangering important pieces of personal information. For instance, a team at Deloitte Consulting generated 80% of the training data for a machine learning model by synthesizing data. The resulting model accuracy was similar to a model trained on real data.

In this exercise, you will generate a synthetic dataset from scratch using Faker that follows a probabilistic distribution loaded as p.

The Faker generator fake_data has been already initialized and numpy is imported as np.

Instructions 1/2

undefined XP

1

2

Generate a list with 5 random cities as cities to replace the original ones using a list comprehension.

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}Exercise

Instructions 1/2

Exercise