Get startedGet started for free

Model deployment

1. Model deployment

In the past, the purpose of a fine-tuned model was to produce insights for a written report. Nowadays, it is more common to deploy models in a production environment, such as a website or a smartphone app. In this lesson, you will learn how to prepare your model for deployment.

2. Pushing to production

Consider a smartphone app that uses machine learning to optimize the user experience. If you fit a classifier to some training data on your laptop, you must then move a copy of it on the production server, where it will start to label new examples. This process of exporting a model to a production server is called "deployment" or "pushing to production".

3. Serializing your model

Moving models across computers works just like any other file: you store them to disk and copy them. Given that models are complex combinations of code and numbers, it is more efficient to use a binary, rather than a text format. This process is often referred to as "serialization", and in Python it can be done using the module Pickle. To access the filesystem, use the built-in open() command. Note the second argument of open(): "w" stands for "write", and "b" for binary. Then use the method .dump() from pickle to write the model to file. To read from file, use "rb" in open(), and the .load() pickle method instead.

4. Serializing your pipeline

What if we wanted to also perform feature selection? Without using the pipeline module, we would have to export both the feature selector and the model from our development environment.

5. Serializing your pipeline

You would also need to modify the script running in the production server to first transform the new data using the feature selector, and then feed it to the classifier. This goes against the following simple advice: if possible, use a single object as your only interaction with the production server. This is because production environments are complicated and, if they break because of a bug, they cause financial losses. So try to keep your interaction with them as simple as possible.

6. Serializing your pipeline

All this is yet another reason to use pipelines. Consider this pipeline involving feature selection and model fitting. The output of grid search CV is not just a list of optimal parameter values, but an actual estimator object, that supports a predict method. This means the pipeline can be serialized and used just like a model.

7. Serializing your pipeline

Moreover, the predict method of a pipeline object includes all transformations that were part of it during fitting, like feature selection, and in this way hides them from the production script. This type of encapsulation is good coding practice.

8. Custom feature transformations

Pipelines allow you to chain several data transformations before the classifier. You can even handle a custom transformation using the FunctionTransformer() function from the preprocessing module. Consider the credit dataset from previous lessons, and assume you wanted to take the negative of the second column. There are two small complications. First, you have to copy the dataset rather than modify it in place, to avoid unintended side effects in the rest of your script. Second, your function must treat the data as a numpy array, because pipelines internally convert pandas DataFrames into numpy arrays. This is why we use numpy indexing inside this function rather than pandas .iloc(). Once that is done, all you need to do is wrap your function with the FunctionTransformer and plug it into your pipeline.

9. Production ready!

Using best scikit-learn practices when you build your workflows makes pushing to production simple and safe. Get ready to have some real impact in live systems. But first, time to practice!