Get startedGet started for free

Exploring data with a privacy budget accountant

Data exploration systems that provide differential privacy must manage a privacy budget that measures the amount of privacy lost across multiple queries.

In this exercise, you'll explore the IBM HR Analytics Employee Attrition & Performance dataset while keeping track of our privacy budget. Remember that if a query exceeds the privacy budget specified in the accountant, an error arises.

The histogram is a valuable tool to visualize the data in a differentially private way. The syntax is the same as the corresponding function in numpy, with an epsilon parameter.

The full dataset is available as hr and the employees' age attribute as ages. A custom function has been created and loaded as show_histogram() to plot the histogram as you did previously in the course.

This exercise is part of the course

Data Privacy and Anonymization in Python

View Course

Exercise instructions

  • Create a privacy BudgetAccountant with an epsilon of 1.5, using the constructor for it.
  • Generate a private histogram from the ages column and with an epsilon value of 0.1.
  • Get and show the private average of ages, using an epsilon of 0.9, and bounds from 10 to 100 as a tuple.
  • Print the privacy budget remaining for the two new following queries.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create the privacy Budget Accountant with epsilon of 1.5
acc = ____

# Use the Budget Accountant acc to draw a private histogram of ages with epsilon 0.1
dp_hist, dp_bins = tools.___(____, epsilon=____, range=[10,100], accountant=____)
show_histogram(dp_hist, dp_bins)

# Get and show the private average of the age variable
print("Mean: ", tools.mean(____))

# Show privacy budget remaining for 2 queries
print("Remaining budget for 2 queries: ", ____)
print("Number of queries recorded: ", len(acc))
Edit and Run Code