Get startedGet started for free

Predicting salaries

In this exercise, you will use the census income dataset to predict if individuals have a salary of more than $50K/year or not.

Remember that you should specify the bounds as a parameter when creating the private model to ensure no additional privacy loss or information leakage. Usually, you can choose the bounds independently of the data, using domain knowledge or search with a DP histogram.

The dataset has been loaded and split into X_train, y_train, X_test, and y_test. The classifier is available as dp_GaussianNB.

This exercise is part of the course

Data Privacy and Anonymization in Python

View Course

Exercise instructions

  • Set the bounds of the model by calculating the min and max values in the training data and adding random noise by subtracting and adding random numbers in a range from 5 to 40 for the 5 columns in our data.
  • Create a dp_GaussianNB classifier with an epsilon of 0.5 and the previously created bounds.
  • Fit the model to the data and see the score.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Set the min and max of bounds for the data and add noise using random
bounds = (X_train.____(axis=0) - random.____(range(5, 40), 5), 
          ____)

# Built the classifier with epsilon of 0.5
dp_clf = ____(epsilon=____, bounds=____)

# Fit the model to the data and print the score
____
print("The accuracy of the differentially private model is ",
       dp_clf.score(X_test, y_test))
Edit and Run Code