The development phase

1. The development phase

If you hear about machine learning, you will likely think of fancy models like neural networks which can beat world-class Go players. We have seen that it takes much more to reap the benefits of machine learning for your company, but it remains an essential step in the development phase, which we will discuss next.

2. Important steps in the development phase

As we have learned previously, the development phase consists of four main steps, and we will now shed more light on what to do in each of them. Here we will move from the right to the left.

3. Data preparation or feature engineering

We start with data preparation.

4. Data preparation or feature engineering

Our data engineer collected and integrated all the data into a central database at the end of the design phase. We must prepare this data so that the machine learning models can consume them. The input data to the model are called features. That is why the whole process is called feature engineering. What this entails depends on the model class. Some, such as deep neural networks, tend to require less feature engineering. Typical tasks are, for example, grouping data. Assume we have many small neighboring countries with only a few customers. We might want to group them, such as the Nordic countries, to get better results and interpretation. Other common tasks concern replacing missing data or dealing with extreme observations. Both the data engineer and the data scientist will do the feature engineering in close collaboration with the business expert. The data engineer is hereby responsible for providing production-ready data, but the data scientist will perform complex mathematical feature engineering tasks such as modeling missing data, often based on input from the business expert.

5. Model training or experimentation

Next is model training, which is at the core of machine learning.

6. Model training or experimentation

Here, the data scientist, or machine learning engineer, tunes and compares different models. Much of this is covered in other courses, such as Machine Learning for Business, so we will not go into detail here. A critical point here is to automatically log all results and track the training carefully. Model training is also called an experiment. This should happen automatically. It is also common to go iteratively back to the data preparation phase to include new features, transform them differently, or drop others that do not help to improve model performance.

7. Model evaluation

Now that we have a good model developed, we rigorously evaluate it.

8. Model evaluation

This includes different steps, such as ensuring that the business requirements are met and that we have not overlooked anything important still outstanding from the design phase. It can also mean, for example, ensuring that the model fulfills data privacy requirements. The evaluation stage might include model stress tests, for example, what our model predicts in extreme economic times and how robust it is to different input data. Depending on the context, we also need to consider whether the model is fair and treats minority groups similarly to the majority population.

9. Testing and verification

Finally, we test our model and ensure it is fully working.

10. Testing and verification

Testing and verification is less about the statistical quality of a model and more about applying traditional software engineering best practices and tests. We want to make, for example, sure the code runs robustly and speedily and that the model will not harm the broader system. We also want to test whether the model actually gives similar results on different systems. Keep in mind that testing ML software is much more challenging because of the strong dependency on potentially changing data. Here, the software and machine learning engineer are often in charge.

11. Let's practice model development

That was a lot, now let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.