Aan de slagGa gratis aan de slag

Generating datasets for classification

Finding an actual dataset meeting all desired combinations of criteria can be complicated and, if collected, have privacy concerns. As a solution, you can use dataset generators to give good approximations of real-world datasets.

In this exercise, you will create a large dataset for a 3-class classification problem. For easy visualization of the generated data as a scatter plot, a custom function has been provided as plot_data_points().

Deze oefening maakt deel uit van de cursus

Data Privacy and Anonymization in Python

Cursus bekijken

Oefeninstructies

  • Import the corresponding function from sklearn.datasets for generating classification datasets.
  • Generate 5000 samples with 4 features, 1 cluster per class, 3 classes, and a class separation of 2.
  • Print the shape of the generated data.
  • See the resulting scatter plot.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Import the function for generating classification datasets
from sklearn.datasets import ____

# Generate 5000 samples with 4 features, 1 cluster per class, 3 classes, and class separation of 2
x, y = ____

# Inspect the generated data shape
print(____)

# Inspect the resulting data points in a 2 dimensional scatter plot
plot_data_points(x, y)
Code bewerken en uitvoeren