Predicting salaries
In this exercise, you will use the census income dataset to predict if individuals have a salary of more than $50K/year or not.
Remember that you should specify the bounds as a parameter when creating the private model to ensure no additional privacy loss or information leakage. Usually, you can choose the bounds independently of the data, using domain knowledge or search with a DP histogram.
The dataset has been loaded and split into X_train
, y_train
, X_test
, and y_test
. The classifier is available as dp_GaussianNB
.
Diese Übung ist Teil des Kurses
Data Privacy and Anonymization in Python
Anleitung zur Übung
- Set the bounds of the model by calculating the
min
andmax
values in the training data and adding random noise by subtracting and adding random numbers in a range from 5 to 40 for the 5 columns in our data. - Create a dp_GaussianNB classifier with an epsilon of
0.5
and the previously created bounds. - Fit the model to the data and see the score.
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Set the min and max of bounds for the data and add noise using random
bounds = (X_train.____(axis=0) - random.____(range(5, 40), 5),
____)
# Built the classifier with epsilon of 0.5
dp_clf = ____(epsilon=____, bounds=____)
# Fit the model to the data and print the score
____
print("The accuracy of the differentially private model is ",
dp_clf.score(X_test, y_test))