Aan de slagGa gratis aan de slag

Predicting salaries

In this exercise, you will use the census income dataset to predict if individuals have a salary of more than $50K/year or not.

Remember that you should specify the bounds as a parameter when creating the private model to ensure no additional privacy loss or information leakage. Usually, you can choose the bounds independently of the data, using domain knowledge or search with a DP histogram.

The dataset has been loaded and split into X_train, y_train, X_test, and y_test. The classifier is available as dp_GaussianNB.

Deze oefening maakt deel uit van de cursus

Data Privacy and Anonymization in Python

Cursus bekijken

Oefeninstructies

  • Set the bounds of the model by calculating the min and max values in the training data and adding random noise by subtracting and adding random numbers in a range from 5 to 40 for the 5 columns in our data.
  • Create a dp_GaussianNB classifier with an epsilon of 0.5 and the previously created bounds.
  • Fit the model to the data and see the score.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Set the min and max of bounds for the data and add noise using random
bounds = (X_train.____(axis=0) - random.____(range(5, 40), 5), 
          ____)

# Built the classifier with epsilon of 0.5
dp_clf = ____(epsilon=____, bounds=____)

# Fit the model to the data and print the score
____
print("The accuracy of the differentially private model is ",
       dp_clf.score(X_test, y_test))
Code bewerken en uitvoeren