Get startedGet started for free

Privacy by design & privacy-enhancing technologies

1. Privacy by design & privacy-enhancing technologies

Welcome!

2. Overview

In this video, you will understand the basics of the privacy by design concept, a proactive systems approach to privacy. It is the crux of Article 25 of GDPR-that is, data protection by design and by default. You will also learn the basics of privacy-enhancing technologies, both the commonly used and the emerging technologies.

3. Data protection by design and by default

Data protection by design is a proactive systems approach that aims to build privacy into the design of the operation of IT systems, networked infrastructure, and business practices. Data protection by default ensures that data collection settings are set to the most privacy-friendly options by default. However, these concepts, especially privacy by design, may seem high-level and abstract. One way to implement this is through Privacy Enhancing Technologies.

4. Privacy-enhancing technologies

So what are Privacy Enhancing technologies or PETs? They are software and hardware solutions, technical processes, methods, or knowledge to achieve data protection functionalities and to protect against data subjects' privacy risks. We've already looked into some commonly used ones, like anonymization, that regulators approve. However, advanced data processing technologies, like Machine Learning and Big Data Analytics, pose unexpected and complex privacy risks. To catch up with the privacy risks of evolving technologies, GDPR requires organizations to consider state-of-the-art technologies to preserve privacy. We will look into two emerging privacy-preserving technologies, Synthetic Data and Federated Learning.

5. Synthetic data

Synthetic data is artificially generated data, often based on real-world datasets. It retains the statistical properties and predictive power of the original data while preserving the privacy of individuals in the generated dataset.

6. Synthetic data use cases

Companies are increasingly adopting synthetic data, especially in regulated sectors like banking, insurance, and healthcare. Companies use synthetic data for two key reasons:To enrich their insufficient data with synthetic data augmentation to build and test AI models. Very importantly, to share data within and outside organizations and for open innovation projects while preserving privacy. Leading market analysts predict that synthetic data will become mainstream for AI projects in the near future.

7. Federated learning

Usually, data needs to leave its location to train AI models; that is, it has to be shared or uploaded to external servers, which may pose a privacy risk. Federated learning is a technique that trains AI models over remote devices like mobile phones or secure data centers while keeping data localized. Federated learning effectively enables the data minimization principle of GDPR: the raw user data never leaves the device, and only model updates are sent to the central server. Let's see how this works in practice.

8. Federated learning example

Let's consider how Federated Learning can preserve privacy for a next-word prediction model on mobile phone keyboards. Instead of sending raw user data to a central server, mobile phones communicate periodically with the server to train a global model hosted on the server. At each round, a subset of phones performs local training on their user data and sends updates to the server. After incorporating the updates, the server sends the new global model back to another subset of devices. This iterative training continues across the network until the training goal is met.

9. Before you go

Remember that there is no perfect privacy solution. You should always consider the scope, context, and related risks of data processing. Adopt suitable measures to preserve privacy based on the risks and benefits of processing personal data.

10. Let's practice!

Go ahead and test your newly acquired privacy knowledge!