BaşlayınÜcretsiz Başlayın

Data masking with PCA

PCA for pseudo-anonymization is widely used among companies. You can find multiple Kaggle challenges and datasets where the data is provided after PCA transformations.

A differentially private version of PCA is also included in the diffprivlib in the models module. It's based on the PCA class from sklearn but including optional arguments for epsilon and min and max bounds. Just as we have seen in the previous chapter.

In this exercise, you will apply data masking with PCA on the NBA Salaries dataset, already loaded as players.

Bu egzersiz

Data Privacy and Anonymization in Python

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Import PCA from sklearn.
  • Initialize PCA() with the number of components to be the same as the number of columns.
  • Apply pca to players.
  • See the resulting dataset.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Import PCA from Scikit-learn
____

# Initialize PCA with number of components to be the same as the number of columns
pca = ____

# Apply PCA to the data
players_pca = ____

# Print the resulting dataset
print(pd.DataFrame(players_pca))
Kodu Düzenle ve Çalıştır