ComeçarComece de graça

Exploring data with a privacy budget accountant

Data exploration systems that provide differential privacy must manage a privacy budget that measures the amount of privacy lost across multiple queries.

In this exercise, you'll explore the IBM HR Analytics Employee Attrition & Performance dataset while keeping track of our privacy budget. Remember that if a query exceeds the privacy budget specified in the accountant, an error arises.

The histogram is a valuable tool to visualize the data in a differentially private way. The syntax is the same as the corresponding function in numpy, with an epsilon parameter.

The full dataset is available as hr and the employees' age attribute as ages. A custom function has been created and loaded as show_histogram() to plot the histogram as you did previously in the course.

Este exercício faz parte do curso

Data Privacy and Anonymization in Python

Ver curso

Instruções do exercício

  • Create a privacy BudgetAccountant with an epsilon of 1.5, using the constructor for it.
  • Generate a private histogram from the ages column and with an epsilon value of 0.1.
  • Get and show the private average of ages, using an epsilon of 0.9, and bounds from 10 to 100 as a tuple.
  • Print the privacy budget remaining for the two new following queries.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Create the privacy Budget Accountant with epsilon of 1.5
acc = ____

# Use the Budget Accountant acc to draw a private histogram of ages with epsilon 0.1
dp_hist, dp_bins = tools.___(____, epsilon=____, range=[10,100], accountant=____)
show_histogram(dp_hist, dp_bins)

# Get and show the private average of the age variable
print("Mean: ", tools.mean(____))

# Show privacy budget remaining for 2 queries
print("Remaining budget for 2 queries: ", ____)
print("Number of queries recorded: ", len(acc))
Editar e executar o código