CommencerCommencer gratuitement

Generating datasets for clustering

Synthetic is fully legal and meets all the requirements of privacy laws and regulations around the world. It's a valid, privacy-conscious alternative to raw data. The make_blobs() function can be used to generate data points with a Gaussian (or normal) distribution.

In this exercise, you will generate a dataset of 15000 samples.

numpy has already been imported as np, and the custom function plot_data_points() has been provided again for this exercise.

Cet exercice fait partie du cours

Data Privacy and Anonymization in Python

Afficher le cours

Instructions

  • Import the corresponding function from the datasets module for generating clustering datasets.
  • Generate a dataset of 15000 samples with 2 features, 2 centers, and a cluster standard deviation of 3.
  • Print the shape of the resulting generated data.
  • Inspect the resulting data points in a 2-dimensional scatter plot.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Import the function from the datasets module for generating clustering datasets
from sklearn.datasets import ____

# Generate a dataset with 15000 rows, 2 features, 2 centers, and a cluster std of 3
x, labels = ____

# Print the shape of the resulting generated data
print(____)

# See the resulting data points in a 2 dimensional scatter plot
plot_data_points(x, labels)
Modifier et exécuter le code