Get startedGet started for free

The design phase

1. Important tasks in the main MLOps stages

We will shed more light on each of the three main stages in the following three lessons. Let's start with the design phase.

2. The design phase

This is the most critical stage: suppose we fail to identify a relevant business objective or cannot identify and get the right data; the later success of our MLOps application will already be doomed.

3. Identify and prioritize use cases

Often we might have different business needs at hand, where machine learning can make a difference. In this case, we should estimate their business relevance, prioritize them and, at least initially, start to design, develop and deploy only one MLOps application at the same time.

4. Business requirements

Once we decide on one use case, we must understand the business requirements. That does not mean fully specifying how the MLOps application should look in the end but nailing down its essential characteristics and ensuring it will actually serve our business. To give you an example: I worked on a credit scoring project in my previous role. Here you might define the desired accuracy or reduction in unpaid advance payments. You might also state how often the scoring should be updated. As a requirement, we might also need to be able to unveil why applications gave, for example, different scores for women compared to men.

5. Assume you have the team and technology

As discussed earlier, let's, for now, assume we have the right people equipped with an excellent technological base, well-designed cross-team incentives, and a thriving culture.

6. Create a project plan

We now gather together and align on a project management methodology such as Agile which is a modern collaborative way to develop software and which is related to DevOps. We discuss the infrastructure, available resources, risks, and much more. All team members should be involved here. But we are not yet sure if the project is feasible. Mainly because,

7. Identifying the right data

next, we need to identify the right data. Right means that the data needs to be relevant to the business problem. Data is the crucial ingredient of our MLOps application. The catchphrase garbage in, garbage out is, unfortunately, true. The data must also be reliable and available, often in real-time.

8. Understand and explore your data

Additional requirements are that the data is of high quality. That means there are, for example, not too many missing data points, not too many extreme observations, or the desired information is actually there. I have often worked with transactional data from sales processes. This data was huge and clean, but it often did not contain any information about customers beyond the name and location. That might be an issue. The data scientist needs to work closely with the data engineer, and particularly with the business expert, to identify and initially explore the data.

9. Ensure infrastructure is ready

In parallel, it is crucial to ensure early on that the available infrastructure will be able to meet the expectations, that might be, for example, making the predictions available with a delay of at most 5 seconds. Once we know that the data is there and the project is feasible, we should set up the so-called data pipelines that will automate data movement from input to endpoint and other technical elements, such as automated code testing. This is especially the task of the data engineer and the backend engineer.

10. Collect and integrate data

Once we are confident that the project is feasible data- and infrastructure-wise, we systematically collect the data and feed it into our central database. This includes already some reformatting and data cleaning and is mainly the task of the data engineer. We also ensure that all the data there can be versioned so we can always reproduce earlier results.

11. Let's practice!

We are ready for the development phase, but first, let's practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.