Privacy budgets
1. Privacy budgets
Let's dive into privacy budgets.2. Definition of differential privacy
Cynthia Dwork, the author of differential privacy, formally presents it with a mathematical definition. A randomized mechanism is epsilon-differentially private if databases that differ in only one row can result in indistinguishable outputs. Any conclusion about a person holds with the same probability, no matter if the person's data is in the dataset or not. Important quantities to be considered in DP mechanisms are: Epsilon and Accuracy as the closeness of the output of DP mechanisms to the pure output.3. $\epsilon$ the privacy parameter
Remember, epsilon is a metric of privacy loss. The smaller the value, the better privacy protection.4. Privacy budget
Imagine Anna as the data curator of sensitive data.5. Privacy budget
And Ben, who wants to access it.6. Privacy budget
If Ben makes the same private query with ? = 1 twice and receives two different estimates, it's like he has made a single query with privacy epsilon = 2. Meaning, privacy loss.7. Privacy budget
Every time he queries the same data, he reduces the level of anonymization because he can average the answers together and get a more accurate estimate, filtering out the noise.8. Privacy budget
This can be addressed with a privacy budget: an absolute limit on the privacy loss that any individual or group is allowed to accrue. Data curators can track who queries them and what they ask. This way, you can define how much of your privacy budget you can use until the data is not considered anonymous anymore.9. What's private enough?
There's not much consensus about what values of epsilon are "private enough". Its goodness depends on the query as well as the data. A possible calculation can be done based on "delta", the adversary's advantage in the probability of guessing some specific property of the output. It's a more complicated way that can still lead to a wide range of possible values for epsilon.10. What's private enough?
Nevertheless, normally values between 0 and 1 are widely considered very good. Values above 10 are not, and values between 1 and 10 are various degrees of "better than nothing". Epsilon is exponential: a system with epsilon = 1 is over 8,000 times more private than epsilon = 10.11. What's private enough?
According to a study conducted by researchers from the University of Southern California and other universities, Apple was allegedly using privacy budgets of epsilon = 14 per day, with unbounded privacy loss over the long term in 2017. According to current Apple's official documentation they use specific privacy budgets depending on the feature. For emoji suggestions, Apple uses a budget with epsilon = 4, and submits one contribution per day.12. Privacy budget: how to track it
Diffprivlib includes a budget accountant to allow you to keep track of the privacy budget being spent. Import the BudgetAccountant class. We can initialize it with an optional value of epsilon and delta (Probability of data leakage). The default value of epsilon is infinite, meaning no limit.13. Privacy budget: how to track it
You can pass the accountant to differentially private operations. For example, calculate the arithmetic mean. The mean is the average value. Here we compute the salaries private mean from the White House dataset. Pass the accountant as an argument to the accountant parameter, so the privacy cost of the operation will be discounted from our budget. In this case, we also set the bounds of the data, to avoid data leakage about the minimum and maximum values in the salaries.14. Privacy budget: how to track it
Then you can determine the total current spend with the method "total()". And the remaining budget to be spent with remaining(). As well as the number of budget spends, by calculating the length of the accountant.15. Privacy budget: how to track it
You can also see the remaining budget for 2 more queries.16. Let's practice!
Let's practice!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.