1. What do we do with data biases?
Hey fellow data champions, let's find out more about data biases.
2. What is bias?
When we think of data ethics, one of the first thoughts we may get is about bias and discrimination. But what is bias? Bias is an unfair prejudice in favor of, or against, a particular person, group, or even an ideology.
In the context of data ethics, this translates to aspects, elements, and attributions in datasets that are not accurately represented, that is, under, over, or misrepresented, which can lead to unwanted and harmful outcomes when this data is used as a basis for decision-making.
3. Anytime, anywhere
Data biases can occur at any stage of the data life cycle. In particular, data biases creep into the data collection processes and dataset development steps, such as cleaning, preparation, and data labeling. These datasets are then used either for analytics or AI models, further perpetuating this bias.
4. Data specific biases
Data biases can be more technical or statistical due to the choice of processes made during data collection or preparation. For example, sampling bias occurs when only some part of the population is selected, errors in the measurement of attributes, inconsistencies when people self-evaluate or self-report their data, and errors and issues in data labeling.
However, more complicated data biases, such as human and systematic biases, are more implicit and intrinsic. Examples include entrenched attitudes about gender, ethnic stereotypes, and cultural artifacts that maybe have crept into datasets.
5. Representation is crucial
One of the significant issues of data biases is the lack of representation in datasets. Joy Boulamwini, an MIT researcher, famously demonstrated in her TED talk how facial recognition algorithms were better at recognizing her when she wore a white mask because they were not trained with enough data on black people. This could affect banal life experiences like soap dispensers not working because of your dark skin or poorly performing healthcare AI for minority individuals. And I recommend an excellent documentary featuring Joy and her work on biases, known as coded bias.
6. A mirror to our stereotypes
Speaking of healthcare, our historical and cultural biases can affect the data used for automated diagnosis with some dangerous outcomes.
For example, most of the data on medicine and the seminal medical textbooks were the results of studies of the human body, limited to the white male body. In many textbooks, symptoms for diseases that may be different for women are labeled as atypical. Let's take heart attack symptoms, for instance; men have constricting chest pain as a symptom, while women may have back pain. Imagine the dataset that represents mainly male heart attack symptoms; what happens when we use this data to make health decisions or use it to develop diagnostic apps? You may have guessed right.
7. Serious impact
This is a screenshot of the results of a diagnostic app with biased data that diagnosed a man's symptoms correctly as an emergency heart issue. At the same time, it wrongly suggested that the woman was only having a panic attack. This is a small illustration to show biased data's devastating and even life-threatening nature.
8. Way too many and counting!
And there are so many biases! Don't worry about the fine print in this picture, but it's an illustration of more than 180 cognitive biases, and of course it's not a complete list.
9. Tip of the ice berg
It's not possible to detect and deal with every bias. And most biases we can tackle are statistical and computational biases that can be prevented through improved methods and processes. However, the human and systematic biases are hard to change with technology but require overall changes in attitudes in society. However, organizations should have a heightened awareness of biases, have mechanisms to identify, measure, and mitigate them,and have an open mind to evaluate their correction mechanisms, and stay open to feedback continuously for responsible and the right use of data.
10. Let's practice!
Excellent, now let's practice!