1. Anonymization
Welcome! Now that you know all the technical and organizational measures to protect your data securely, we have to assess whether these measures are robust—especially measures related to anonymization and processes for using or sharing anonymized data. Let's find out more!
2. Anonymized data
According to GDPR, anonymous data is “information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable”.
Once data is truly anonymous, and individuals are no longer identifiable, the data will not fall within the scope of the GDPR.
Anonymization can be an effective way for organizations to harness the potential of their data while respecting data protection obligations and laws like GDPR.
3. Anonymization techniques
There are many anonymization techniques; I will introduce a few commonly used ones.
Aggregation, where data is displayed as totals or descriptive statistics
Data Perturbation, where the values from the original dataset are modified to be slightly different.
Swapping or shuffling, where the purpose is to rearrange data in the dataset such that the individual attribute values are still represented in the dataset but generally do not correspond to the original records. This technique is also referred to as permutation.
Record suppression refers to the removal of an entire record in a dataset.
4. Risks of re-identification
We have seen that the objective of anonymization is to remove all information related to an individual in a way that is impossible to re-identify them.
However, with evolving new technologies, data sources, and powerful computational resources, anonymized data may not be completely anonymous and can be re-identified. You have to be aware of this limitation since it will strongly impact your ability to share, reuse or monetize data that you may consider anonymous but, in reality, it may not be. Let's see an example.
5. Anonymous DNA donors?
Genetic data profiles are an example of personal data that can be at risk of de-identification. A few researchers tested to see if they could identify individuals from a genealogy database with anonymous genetic profiles since they had removed direct personal identifiers like names.
The researchers combined the anonymous profile information with publicly available genetic resources like Geneology registers, obituaries, results of search engine queries, and other details like time of donation and age to reveal the identity of the supposedly anonymous donors.
6. Key considerations
After anonymizing your data, you should check how robust your technique is by beginning to answer at least these three questions:
is it possible to single out an individual, known as singling out risk;
is it possible to link records relating to an individual, known as linkage risk, and can information be inferred concerning an individual? Also known as inference risk.
7. Utility and data protection
That brings us to the privacy versus utility trade-off, where the higher the data protection, the lower the utility of data, and vice versa.
We have seen that fully anonymous data is hard to achieve, and even if we have zero risk of re-identification, that anonymous data may be useless. So organizations need to understand this trade-off for their specific processing situations.
Data protection risks continue to evolve, and GDPR states with foresight that organizations should adopt state-of-the-art measures and include privacy-by-design approaches to create a balance between privacy and utility and ensure data protection. We'll learn about these concepts in the following video.
8. Let's practice!
Great! Now let's practice your anonymization knowledge.