Get startedGet started for free

Generating datasets for classification

Finding an actual dataset meeting all desired combinations of criteria can be complicated and, if collected, have privacy concerns. As a solution, you can use dataset generators to give good approximations of real-world datasets.

In this exercise, you will create a large dataset for a 3-class classification problem. For easy visualization of the generated data as a scatter plot, a custom function has been provided as plot_data_points().

This exercise is part of the course

Data Privacy and Anonymization in Python

View Course

Exercise instructions

  • Import the corresponding function from sklearn.datasets for generating classification datasets.
  • Generate 5000 samples with 4 features, 1 cluster per class, 3 classes, and a class separation of 2.
  • Print the shape of the generated data.
  • See the resulting scatter plot.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import the function for generating classification datasets
from sklearn.datasets import ____

# Generate 5000 samples with 4 features, 1 cluster per class, 3 classes, and class separation of 2
x, y = ____

# Inspect the generated data shape
print(____)

# Inspect the resulting data points in a 2 dimensional scatter plot
plot_data_points(x, y)
Edit and Run Code