Loading a pre-trained model

1. Loading a pre-trained model

We are now going to learn efficient ways of loading pre-trained ML models.

2. Current structure

Currently, an ML model is loaded when a prediction request comes in but there is a challenge here.

3. Challenge with loading models

Should we load our model each time a request comes in? Or is there a better way? Let's think about our restaurant kitchen again - just like we would want our oven to be preheated and ready when customers arrive to provide fast service, we need our ML models to be loaded and ready to infer before the users start sending requests.

4. Load models before the request

Essentially, we'd want our models to load as soon as the server starts up.

5. Loading the model

We already have a pre-trained ML model as a joblib file. We set the sentiment_model variable to None. Next, we define the load_model() function where we're using a global variable to store our model - this ensures it's accessible throughout our application. Then we load the model using the pre-defined SentimentAnalyzer class which takes the path to a pre-trained ML model as a joblib file, and loads the model into the sentiment_model variable. When we want to load the model, we call our load_model function.

6. FastAPI lifespan event

Now, here's where FastAPI's magic comes in! The server starts, by using an async lifespan event, we load the model as soon as the server starts and then the API is ready to handle user requests.

7. FastAPI lifespan event

We imported the asynccontextmanager from contextlib; it is a decorator that creates an asynchronous context manager. We define our lifespan function that takes the FastAPI app as a parameter and uses a yield statement to ensure resources are properly initialized. This is so that everything before the yield statement runs during app startup, calling the previously defined load_model() function. And to run this, we pass our lifespan function to FastAPI's constructor using the lifespan parameter. That's all we need - FastAPI handles the rest, ensuring our resources are properly managed throughout our app's lifecycle!

8. Health checks

Before we start receiving requests, we need a way to check if our API is ready and has the model loaded or not. That's why we add a health check endpoint. We check if the model is loaded or not and return healthy status when the model is available to make predictions, and unhealthy status when the model isn't available. To test the endpoint, we can send a GET request at the local server using the curl command.

9. Let's practice!

Fantastic, we can now load pre-trained models efficiently using lifespan events. Let's implement this now!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Deploying AI into Production with FastAPI

AdvancedSkill Level

4.9+

215 reviews