MulaiMulai sekarang secara gratis

Datasets with the same probabilistic distribution

The goal of synthetic data is to create a dataset that is as realistic as possible, and does so without endangering important pieces of personal information. For instance, a team at Deloitte Consulting generated 80% of the training data for a machine learning model by synthesizing data. The resulting model accuracy was similar to a model trained on real data.

In this exercise, you will generate a synthetic dataset from scratch using Faker that follows a probabilistic distribution loaded as p.

The Faker generator fake_data has been already initialized and numpy is imported as np.

Latihan ini adalah bagian dari kursus

Data Privacy and Anonymization in Python

Lihat Kursus

Latihan interaktif praktis

Cobalah latihan ini dengan menyelesaikan kode contoh berikut.

# Obtain or specify the probabilities
p = (0.46, 0.26, 0.16, 0.1, 0.02)

# Generate 5 random cities 
cities = ____

# See the generated cities
print(cities)
Edit dan Jalankan Kode