MLOps case study: Increasing profits with MLOps
1. MLOps case study: Increasing profits with MLOps
Welcome to the final part of this course.2. Real-life case study
Now we want to take all that we learned into practice and conclude this course with a real-life case study I will tell you about an MLOps project I worked on. Not to sell you how greatly we did, but to reflect on all we discussed and show you where we succeeded and where we could have done better.3. Case study: cooling water demand and availability
The project was about predicting the amount of cooling water required and available for a production site over the next two weeks. The business requested a fairly accurate idea about potential production bottlenecks due to a lack of sufficient cooling capacity.4. Modeling cooling water
For that purpose, we predicted every hour how much cooling water would be required and would be available on an hourly basis over the next two weeks. This means 336 forecasts were automatically generated each hour. The results were presented in a dashboard to the higher management and engineers responsible for cooling capacity.5. Modeling cooling water
We based the forecast on internal data, sensors, and information about production planning, as well as external data such as an hourly weather forecast over the next two weeks.6. The team
We worked with three and a half people on this project: a data scientist (that was me) who was also responsible for the dashboard, a data engineer, who was responsible for the automated data flow and its quality, and a data architect. The latter was in charge of the data-related infrastructure. Occasionally, we also brought in a dedicated backend engineer, for example, for cybersecurity-related questions. We didn't have a software engineer on board; consequently, our code could have been better documented and written more comprehensibly. That was a lesson learned for us.7. Collaboration
We worked closely together in a DevOps-style. Everyone was responsible for development as well as operations. We had a very good team culture, organized ourselves, and were fast to incorporate new critical feature requests or restore the application after a downtime.8. Project progression
We had a clear business mandate and identified and automatically extracted the required data early on. The modeling worked well, and we could provide a baseline model within a few weeks that we then improved over a more extended period and with new and higher-quality data. We versioned all code, modeling- as well as data-related. Versioning allowed us quickly restore previous results if something went wrong and to work in parallel on different components.9. Project progression - infrastructure
The initial available on-premise infrastructure could have been better; there were no automated tests, for example, but the data architect successfully transitioned to a better architecture based on the GitLab platform over time.10. Conclusion
We deployed the model, and the business was delighted. They used the dashboard and the forecasts to make crucial production-related business decisions about production planning. This information would have otherwise not been available, and it was critical, particularly during heat waves, for better planning and to react to feedback quickly. We had some downtimes that we could have prevented with more MLOps experience and better tools. For example, based on unavailable input data. The latter is not only a data engineering issue but also relates to the modeling that could have been made more robust to missing inputs. In general, we fully fulfilled the business request, but admittedly, we failed to offer an application that we could easily streamline.11. Our MLOps maturity
Looking again at the Microsoft maturity model, we were between levels 2 and 3. We deployed automatically but failed to fully track the model development (which was mainly my fault as a data scientist). We definitely should have done this to speed up training and keep a better overview of all the insights we gained. We did not retrain our models automatically, but this was also not a priority since we talk here about physical relationships that hardly change over time. We monitored our system closely but could have integrated better-automated tests to pinpoint the root cause of potential errors.12. Let's practice!
Now let's practice before we conclude this course!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.