CommencerCommencer gratuitement

Predicting salaries

In this exercise, you will use the census income dataset to predict if individuals have a salary of more than $50K/year or not.

Remember that you should specify the bounds as a parameter when creating the private model to ensure no additional privacy loss or information leakage. Usually, you can choose the bounds independently of the data, using domain knowledge or search with a DP histogram.

The dataset has been loaded and split into X_train, y_train, X_test, and y_test. The classifier is available as dp_GaussianNB.

Cet exercice fait partie du cours

Data Privacy and Anonymization in Python

Afficher le cours

Instructions

  • Set the bounds of the model by calculating the min and max values in the training data and adding random noise by subtracting and adding random numbers in a range from 5 to 40 for the 5 columns in our data.
  • Create a dp_GaussianNB classifier with an epsilon of 0.5 and the previously created bounds.
  • Fit the model to the data and see the score.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Set the min and max of bounds for the data and add noise using random
bounds = (X_train.____(axis=0) - random.____(range(5, 40), 5), 
          ____)

# Built the classifier with epsilon of 0.5
dp_clf = ____(epsilon=____, bounds=____)

# Fit the model to the data and print the score
____
print("The accuracy of the differentially private model is ",
       dp_clf.score(X_test, y_test))
Modifier et exécuter le code