1. Learn
  2. /
  3. Courses
  4. /
  5. Data Privacy and Anonymization in Python

Connected

Exercise

Generating datasets for clustering

Synthetic is fully legal and meets all the requirements of privacy laws and regulations around the world. It's a valid, privacy-conscious alternative to raw data. The make_blobs() function can be used to generate data points with a Gaussian (or normal) distribution.

In this exercise, you will generate a dataset of 15000 samples.

numpy has already been imported as np, and the custom function plot_data_points() has been provided again for this exercise.

Instructions

100 XP
  • Import the corresponding function from the datasets module for generating clustering datasets.
  • Generate a dataset of 15000 samples with 2 features, 2 centers, and a cluster standard deviation of 3.
  • Print the shape of the resulting generated data.
  • Inspect the resulting data points in a 2-dimensional scatter plot.