Exploring data with a privacy budget accountant
Data exploration systems that provide differential privacy must manage a privacy budget that measures the amount of privacy lost across multiple queries.
In this exercise, you'll explore the IBM HR Analytics Employee Attrition & Performance dataset while keeping track of our privacy budget. Remember that if a query exceeds the privacy budget specified in the accountant, an error arises.
The histogram is a valuable tool to visualize the data in a differentially private way. The syntax is the same as the corresponding function in numpy
, with an epsilon parameter.
The full dataset is available as hr
and the employees' age attribute as ages
. A custom function has been created and loaded as show_histogram()
to plot the histogram as you did previously in the course.
This exercise is part of the course
Data Privacy and Anonymization in Python
Exercise instructions
- Create a privacy
BudgetAccountant
with anepsilon
of1.5
, using the constructor for it. - Generate a private histogram from the
ages
column and with anepsilon
value of0.1
. - Get and show the private average of
ages
, using anepsilon
of0.9
, and bounds from10
to100
as a tuple. - Print the privacy budget remaining for the two new following queries.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create the privacy Budget Accountant with epsilon of 1.5
acc = ____
# Use the Budget Accountant acc to draw a private histogram of ages with epsilon 0.1
dp_hist, dp_bins = tools.___(____, epsilon=____, range=[10,100], accountant=____)
show_histogram(dp_hist, dp_bins)
# Get and show the private average of the age variable
print("Mean: ", tools.mean(____))
# Show privacy budget remaining for 2 queries
print("Remaining budget for 2 queries: ", ____)
print("Number of queries recorded: ", len(acc))