Exercise

Predicting salaries

In this exercise, you will use the census income dataset to predict if individuals have a salary of more than $50K/year or not.

Remember that you should specify the bounds as a parameter when creating the private model to ensure no additional privacy loss or information leakage. Usually, you can choose the bounds independently of the data, using domain knowledge or search with a DP histogram.

The dataset has been loaded and split into X_train, y_train, X_test, and y_test. The classifier is available as dp_GaussianNB.

Instructions

100 XP
  • Set the bounds of the model by calculating the min and max values in the training data and adding random noise by subtracting and adding random numbers in a range from 5 to 40 for the 5 columns in our data.
  • Create a dp_GaussianNB classifier with an epsilon of 0.5 and the previously created bounds.
  • Fit the model to the data and see the score.