Get startedGet started for free

Common data mistakes

1. Common data mistakes

Let's explore the common mistakes people make when working with data.

2. Common mistakes about data

There are many mistakes that people can make when working with data. Reflecting on the data life cycle framework, the most common ones include:

3. Common mistakes about data

Not properly defining the problem or question that the data is intended to answer.

4. Common mistakes about data

Not collecting enough data, or the wrong data, therefore unable to accurately answer the defined question.

5. Common mistakes about data

Lacking appropriate statistical methods or tools for the specific type of data and research question.

6. Common mistakes about data

And lastly, as seen before, not communicating the results effectively. Planning ahead might help reducing these mistakes, but let's discuss some examples.

7. Not clearly defining the problem

Say for example that you want to know more about purchase habits of a group of customers. Asking a question such as "Did you buy anything in the last month?" might give you a general idea of what they bought, but the question is too vaguely defined to get actual insights. "Where did you make your last purchase?" or "Which payment method did you use?" might be better alternatives to get the data you need. Without a clear question to the problem at hand, you risk inappropriate data collection and analysis, and ultimately, incorrect conclusions.

8. Insufficient or wrong data

Now, say that you're interested in the payment methods of elderly people, and you collect data through an online survey. Without realizing, most of the responses come from young adults, and only very few from elderly people. This is an example of data bias: the responses that you get back are not representative of the target audience you focused your question on. As you see, this survey data might give you some insights about the purchase habits of young adults, it does not allow you to answer the research question. Note that collecting the right data still needs proper cleaning or processing before analysis.

9. Lack of appropriate analysis

OK, so let's suppose you asked the right questions and collected the right data. Jumping to conclusions without proper analysis won't work. Say for example that the data tells you that there was a steep decline in the number of contactless payments. You could quickly conclude that elderly people are less inclined to use this payment method. However, unknown to the researchers, there were many technical issues with payment terminals in the last week, which might explain the decrease in contactless payments. This is an example of lack of context, which may lead to misinterpreting the results correctly. There are of course other reasons for data analysis mistakes, such as using incorrect aggregations or calculations, or confusing correlation with causation.

10. No clear communication of results

Finally, as mentioned before, presenting the results using clear communication is the most valuable part of the data life cycle. Not doing this does not only means that your work has been for nothing, it could, again, lead to misunderstandings or incorrect conclusions. Various things could go wrong at this stage. For example, you could have used very complex statistical techniques to analyze payment habits, but your manager lacks the technical knowledge and doesn't see how the analysis is relevant to the business. Or you might cherry-pick certain data points or use misleading chart types to make your case, even if the data doesn't actually support your argument. Or your visualizations may lack clear labels, legends, axes titles, or colors, increasing the chance of misinterpretations. DataCamp offers a wide variety of courses that tackle this topic more in-depth.

11. Let's practice!

Let's see if you can spot data mistakes in this last set of exercises.