Get startedGet started for free

Pre-processing data

Pre-processing for clustering can be a way to prepare data for more accurate segmentation. One type of pre-processing is feature scaling, a technique to standardize the independent features present in the data to fit a fixed range, e.g., 0-1 or 0-100.

In this exercise, you will perform clustering on the columns of parental_level_of_education and writing_score in the student performance dataset loaded as performance. First, you will create and run a k-means model without any pre-processing data. Then, do the same but by pre-processing data with feature scaling.

The private k-means model has been imported as KMeans from diffprivlib.models. The StandardScaler scaler and dimensionality reduction PCA have been imported from sklearn.

This exercise is part of the course

Data Privacy and Anonymization in Python

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Build the differentially private k-means model
model = KMeans(____)

# Fit the model to the data
____

# Print the inertia in the console output
print("The inertia of the private model is: ", model.inertia_)
Edit and Run Code