Model packaging
1. Model packaging
We will close off this chapter by digging a bit deeper into an MLOps-worthy model package.2. Into the wild
Model packaging marks the end of the ML development stage and our entry into Operations. This is when things get real, and, before we step into that wilderness, we need to ensure that our model package has everything we need to reach our core MLOps objectives. These are, once again:3. Into the wild - deployment
smooth deployment4. Into the wild - reproducibility
reproducibility5. Into the wild - monitoring
and monitoring.6. Model storage format options
First, we need our crown jewel, the trained model itself. We can save it in various formats, but ultimately, we must choose one that our model development framework can produce and our serving framework can load and run.7. PMML and pickle
Two well-known examples are the PMML and the pickle format.8. PMML
PMML is designed to be universal, allowing you to train a model using one programming language, then load it and serve it using an application written in a completely different one. The downside of such universal formats is that they can be pretty tricky to customize. Open-source tools give us the highest degree of freedom when custom models are required.9. Pickle
Within the Python ecosystem, for example, the most common object storage format is the so-called pickle format. There is practically no limit to what you can store in it. Still, we lose on the side of cross-platform compatibility: a "pickled" model can only be loaded by another Python application that has the exact same libraries as the ones used during model training.10. Pickle 2
So, if pickle is our format of choice, we must store the list of model dependencies within the package metadata and use it to verify compatibility on the serving side. Bottom line, there is no free lunch, so choose your model storage format carefully.11. Reproducibility 1
Then, we must ensure reproducibility. As mentioned previously, a model is reproducible if we can recreate it in an automated manner at any point in time. Being able to do that proves that we control our model production process to the finest level of detail. We will not lay out the reproduction procedure step-by-step, but let's list the ingredients you need to have in your package when the time comes.12. Reproducibility 2
They are: A pointer to the exact version of the model build pipeline code. A pointer to the exact versions of the datasets used during the training, including the train/splits during performance evaluation. The record of the performance achieved on the test set.13. Monitoring
Finally, we want to monitor our model in production. Whether it is implemented within the model-serving app or delegated to another service, the prerequisite is that data profiles, which contain our expectations about the input and output data, are saved within the model package.14. Lock 'n' load!
We're locked and loaded and ready to go! Excited? Not so fast.15. Let's practice!
First, let's establish what we have learned!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.