1. Learn
  2. /
  3. Courses
  4. /
  5. Data Privacy and Anonymization in Python

Connected

Exercise

Generating datasets for classification

Finding an actual dataset meeting all desired combinations of criteria can be complicated and, if collected, have privacy concerns. As a solution, you can use dataset generators to give good approximations of real-world datasets.

In this exercise, you will create a large dataset for a 3-class classification problem. For easy visualization of the generated data as a scatter plot, a custom function has been provided as plot_data_points().

Instructions

100 XP
  • Import the corresponding function from sklearn.datasets for generating classification datasets.
  • Generate 5000 samples with 4 features, 1 cluster per class, 3 classes, and a class separation of 2.
  • Print the shape of the generated data.
  • See the resulting scatter plot.