Pre-processing data
Pre-processing for clustering can be a way to prepare data for more accurate segmentation. One type of pre-processing is feature scaling, a technique to standardize the independent features present in the data to fit a fixed range, e.g., 0-1 or 0-100.
In this exercise, you will perform clustering on the columns of parental_level_of_education
and writing_score
in the student performance dataset loaded as performance
. First, you will create and run a k-means model without any pre-processing data. Then, do the same but by pre-processing data with feature scaling.
The private k-means model has been imported as KMeans
from diffprivlib.models
. The StandardScaler
scaler and dimensionality reduction PCA
have been imported from sklearn
.
This exercise is part of the course
Data Privacy and Anonymization in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Build the differentially private k-means model
model = KMeans(____)
# Fit the model to the data
____
# Print the inertia in the console output
print("The inertia of the private model is: ", model.inertia_)