Generating datasets for clustering
Synthetic is fully legal and meets all the requirements of privacy laws and regulations around the world. It's a valid, privacy-conscious alternative to raw data. The make_blobs()
function can be used to generate data points with a Gaussian (or normal) distribution.
In this exercise, you will generate a dataset of 15000
samples.
numpy
has already been imported as np
, and the custom function plot_data_points()
has been provided again for this exercise.
Cet exercice fait partie du cours
Data Privacy and Anonymization in Python
Instructions
- Import the corresponding function from the
datasets
module for generating clustering datasets. - Generate a dataset of
15000
samples with2
features,2
centers, and a cluster standard deviation of3
. - Print the shape of the resulting generated data.
- Inspect the resulting data points in a 2-dimensional scatter plot.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Import the function from the datasets module for generating clustering datasets
from sklearn.datasets import ____
# Generate a dataset with 15000 rows, 2 features, 2 centers, and a cluster std of 3
x, labels = ____
# Print the shape of the resulting generated data
print(____)
# See the resulting data points in a 2 dimensional scatter plot
plot_data_points(x, labels)