CommencerCommencer gratuitement

Exploring data with a privacy budget accountant

Data exploration systems that provide differential privacy must manage a privacy budget that measures the amount of privacy lost across multiple queries.

In this exercise, you'll explore the IBM HR Analytics Employee Attrition & Performance dataset while keeping track of our privacy budget. Remember that if a query exceeds the privacy budget specified in the accountant, an error arises.

The histogram is a valuable tool to visualize the data in a differentially private way. The syntax is the same as the corresponding function in numpy, with an epsilon parameter.

The full dataset is available as hr and the employees' age attribute as ages. A custom function has been created and loaded as show_histogram() to plot the histogram as you did previously in the course.

Cet exercice fait partie du cours

Data Privacy and Anonymization in Python

Afficher le cours

Instructions

  • Create a privacy BudgetAccountant with an epsilon of 1.5, using the constructor for it.
  • Generate a private histogram from the ages column and with an epsilon value of 0.1.
  • Get and show the private average of ages, using an epsilon of 0.9, and bounds from 10 to 100 as a tuple.
  • Print the privacy budget remaining for the two new following queries.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Create the privacy Budget Accountant with epsilon of 1.5
acc = ____

# Use the Budget Accountant acc to draw a private histogram of ages with epsilon 0.1
dp_hist, dp_bins = tools.___(____, epsilon=____, range=[10,100], accountant=____)
show_histogram(dp_hist, dp_bins)

# Get and show the private average of the age variable
print("Mean: ", tools.mean(____))

# Show privacy budget remaining for 2 queries
print("Remaining budget for 2 queries: ", ____)
print("Number of queries recorded: ", len(acc))
Modifier et exécuter le code