Get startedGet started for free

Types of data bias

1. Types of data bias

Hi! In this video, we’ll embark on a journey to understand the diverse manifestations of data bias.

2. The dynamics of decision making

Researchers at the University of California estimated that adults make approximately thirty five thousand conscious decisions each day, ranging from "what career path should I pursue?" to "what should I eat for dinner?". Humans need to think and react quickly, so over time our brains have evolved to take shortcuts that help us quickly reach conclusions based on information we’ve learned in the past. These mental shortcuts are called heuristics, and they help our brains simplify information processing and reach decisions faster. However, these heuristics can cause cognitive biases.

3. Cognitive biases

Cognitive bias refers to systematic patterns of deviation from norm or rationality in judgment and decision-making processes. These biases are related to the way individuals process information and make decisions. Data bias can be a result of those cognitive biases. For example, an analyst expecting positive impact when analyzing a recent marketing campaign, unconsciously favoring positive data, and overlooking potential shortcomings during the analysis.

4. Systemic biases

The second category is systemic bias. While cognitive biases pertain to individual decision-making processes, systemic biases highlight broader issues guiding data-related activities. It refers to biases that are inherent in the processes, structures, or systems used to collect, analyze, and interpret data. These biases frequently arise from systemic inequalities, imbalances, or structural flaws in data-related practices, such as biased data collection methods or flawed algorithmic designs.

5. Bias in the data lifecycle

Systemic and cognitive biases represent the origins of data bias. They can take on various forms at different stages of the data lifecycle, from data collection to the sharing of data insights and decision-making. Understanding the various types of data bias is the first step toward building a robust defense against their impact.

6. Unveiling data collection biases

The initial stage of the data lifecycle is data collection, which is vulnerable to several types of systemic biases. One prominent form is selection bias, where the collection process favors certain groups or characteristics over others, leading to an incomplete or skewed representation of the overall population. Additionally, historical bias may be present in datasets, reflecting past inequalities or systemic issues, perpetuating existing imbalances. Measurement bias occurs when instruments or methodologies systematically misrepresent certain attributes. Whether it's a skewed survey question or an inadequately calibrated sensor, measurement bias can distort the data intended to be an objective reflection of reality. More on data collection bias in chapter two.

7. Unveiling bias in data analysis

Transitioning from data collection to the analytical phase, we encounter biases that emerge during data analysis and model development. First, we have the broad category of cognitive bias. One prominent type is confirmation bias which refers to the tendency to seek and interpret information that confirms pre-existing beliefs. In addition, reporting bias is a potential pitfall where certain findings are highlighted or suppressed, shaping the narrative around the data.

8. Bias in model development

The integration of algorithms introduces new dimensions of bias. Algorithmic bias occurs when machine learning models reflect the biases present in the training data, potentially leading to discriminatory outcomes. Additionally, automation bias can occur when people trust automated systems to the point where they rely on them too much, even when the system might be wrong. More on data analysis bias in chapter three.

9. Let's practice!

Now, let's dive deeper into these concepts with some practical examples!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.