Get startedGet started for free

Machine learning mistakes

1. Machine learning mistakes

Now that you've looked into some machine learning risks, let's dive into common machine learning mistakes as far as the business is concerned.

2. Mistakes

There are a number of common machine learning mistakes. This is not supposed to be an exhaustive list, but these are one of the key things to look out for - we will review situations when machine learning is started without proper preparation, understand why data availability is vital, talk about the importance of target variable definition, review the mistake of long model development resulting in late testing and no impact, and finally the importance of feature selection.

3. Machine learning first

Now, machine learning should not be the first data initiative the company or the team pursues. There should be key foundations in place, as we've discussed earlier in the course about the data needs pyramid. It's not uncommon to pursue using machine learning because it's all over media, but there are pre-requisites to it such as having the data ready and clean, enough descriptive analysis done to understand the problem, and only then can the business identify opportunities for machine learning.

4. Not enough data

Having enough data is key to successful machine learning projects. There is always a risk for machine learning when there's too little or too messy data. This principle is summarized in the saying "Garbage in, garbage out", meaning that if ML is fed incorrect or poor quality data, the output will be of the same low quality. Always make sure that data availability and quality aspects are ticked off before pursuing machine learning.

5. Target variable definition

Defining the business problem and hence the target variable is critical as we've seen in some parts of the previous videos. You should understand what is it that you're predicting - is it fraud, is it churn, is it customer purchase, production failure, car crash? Can we observe the target variable, do we have a clear definition of it, or do we have to invent it? For example, churn can be contractual, meaning customers cancel the contract, like terminating a premium credit card. It can then be observed. Non-contractual churn, where customers just stop using the service without explicit cancellation, like switching a grocery store, is harder to observe. When the outcome is not directly observed, an in-depth analysis needs to be done before the model is built. Then, the business teams have to be involved in the definition phase, and apply their field expertise to shape the target variable definition. This is one of the most critical steps in defining the machine learning problem.

6. Feature selection

Now, feature selection is very important step as well, and under-investing is a critical mistake. There are different things to consider depending if the model is inferential or predictive. In the inference case, we need to choose the variables we can control and affect like latency, price, delivery terms or service. It's important that business uses their expertise to identify these levers. In prediction case, it's more iterative, and more freedom to the machine learning team. First, start with readily available data, and build a simple model. If it's ok, test it in the market. If it works, add new features from less easily available sources iteratively after and keep testing.

7. Late testing, no impact

You've seen the AB testing chart before. Once the target variable and features are defined, and a decent baseline model is built, we have to test it, as fast as we can, to see if it is actionable. Too often the machine learning teams pursue model perfection and can spend months iterating through incremental improvements without market testing. Make sure the ML team has a target due date for market testing.

8. Let's practice!

Great progress! Let's see if we can identify some of the ML mistakes in the exercises!