Pre-processing data

Pre-processing for clustering can be a way to prepare data for more accurate segmentation. One type of pre-processing is feature scaling, a technique to standardize the independent features present in the data to fit a fixed range, e.g., 0-1 or 0-100.

In this exercise, you will perform clustering on the columns of parental_level_of_education and writing_score in the student performance dataset loaded as performance. First, you will create and run a k-means model without any pre-processing data. Then, do the same but by pre-processing data with feature scaling.

The private k-means model has been imported as KMeans from diffprivlib.models. The StandardScaler scaler and dimensionality reduction PCA have been imported from sklearn.

1
- Create the private clustering model using 4 clusters as argument.
- Fit the model to the data performance as argument.

2
- Standardize data with a standard scaler scaler using the .fit_transform() method.
- Use pca to fit and transform the data performance with the .fit_transform() method.
- Build the private KMeans() model using 4 clusters.
- Fit model to the data performance.

Introduction to Data Privacy

More on Privacy-Preserving Techniques

Differential Privacy

Anonymizing and Releasing Datasets

Exercise

Pre-processing data

Instructions 1/2