1. 학습
  2. /
  3. 강의
  4. /
  5. Data Privacy and Anonymization in Python

Connected

연습 문제

Generating datasets for clustering

Synthetic is fully legal and meets all the requirements of privacy laws and regulations around the world. It's a valid, privacy-conscious alternative to raw data. The make_blobs() function can be used to generate data points with a Gaussian (or normal) distribution.

In this exercise, you will generate a dataset of 15000 samples.

numpy has already been imported as np, and the custom function plot_data_points() has been provided again for this exercise.

지침

100 XP
  • Import the corresponding function from the datasets module for generating clustering datasets.
  • Generate a dataset of 15000 samples with 2 features, 2 centers, and a cluster standard deviation of 3.
  • Print the shape of the resulting generated data.
  • Inspect the resulting data points in a 2-dimensional scatter plot.