Generating datasets for classification
Finding an actual dataset meeting all desired combinations of criteria can be complicated and, if collected, have privacy concerns. As a solution, you can use dataset generators to give good approximations of real-world datasets.
In this exercise, you will create a large dataset for a 3-class classification problem. For easy visualization of the generated data as a scatter plot, a custom function has been provided as plot_data_points()
.
This exercise is part of the course
Data Privacy and Anonymization in Python
Exercise instructions
- Import the corresponding function from
sklearn.datasets
for generating classification datasets. - Generate
5000
samples with4
features,1
cluster per class,3
classes, and a class separation of2
. - Print the shape of the generated data.
- See the resulting scatter plot.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the function for generating classification datasets
from sklearn.datasets import ____
# Generate 5000 samples with 4 features, 1 cluster per class, 3 classes, and class separation of 2
x, y = ____
# Inspect the generated data shape
print(____)
# Inspect the resulting data points in a 2 dimensional scatter plot
plot_data_points(x, y)