Exercise

Pre-processing data

Pre-processing for clustering can be a way to prepare data for more accurate segmentation. One type of pre-processing is feature scaling, a technique to standardize the independent features present in the data to fit a fixed range, e.g., 0-1 or 0-100.

In this exercise, you will perform clustering on the columns of parental_level_of_education and writing_score in the student performance dataset loaded as performance. First, you will create and run a k-means model without any pre-processing data. Then, do the same but by pre-processing data with feature scaling.

The private k-means model has been imported as KMeans from diffprivlib.models. The StandardScaler scaler and dimensionality reduction PCA have been imported from sklearn.

Instructions 1/2

undefined XP
  • 1
    • Create the private clustering model using 4 clusters as argument.
    • Fit the model to the data performance as argument.
  • 2
    • Standardize data with a standard scaler scaler using the .fit_transform() method.
    • Use pca to fit and transform the data performance with the .fit_transform() method.
    • Build the private KMeans() model using 4 clusters.
    • Fit model to the data performance.