1. Handling different input types in FastAPI
It's time to dive deeper into handling requests via prediction endpoints. Different models accept different types of input data. Let's learn how to handle input data types specific to our models.
2. Restaurant vs API
Just as a kitchen has different stations to handle various ingredients - a grill for meats, a prep station for vegetables, and a pastry station for desserts - our API needs special ways to process different types of data.
3. Validation flow
Here's how our API will work.
First, incoming data arrives at our endpoint. Pydantic models act like our quality control, ensuring the data meets our requirements.
Then, we process each type of data appropriately, for example, numerical data gets transformed as per model requirements and text data may be analyzed for keywords.
Finally, we send back the processed input to the model for prediction and return the response to clients.
4. Comment moderation system
We're going to build a content moderation system. To implement this, we'll need to handle different types of input data.
Our moderation API has two main jobs.
First, it processes numerical comment metric data - that contain the length of the comment, user_karma for a reputation score of the user, and report_count for the number of times that comment is reported - to score the comment using a model.
Second, it processes comment text to analyze sentiment using a sentiment analysis model.
Each type of data follows its own path through our API, with specific validation and processing steps.
5. Endpoint for floating point numbers
Let's look at our comment score prediction endpoint for numerical data. We have created the app and loaded our pre-trained ML model that scores comments based on their features.
The next step is to create a POST endpoint at "/predict".
Then, we define our function that takes numerical data as CommentMetrics parameter. FastAPI automatically validates this input against our Pydantic model CommentMetrics.
Next comes data preparation. In the predict_score function, we convert our input data into a 2D NumPy array, reshaping it for our model as it needs tabular data with rows and columns.
We then pass it to our pre-trained ML model which is loaded in the pre-defined CommentScorer class to make predictions. This is usually provided by the Data Science team, so we won't be building it here.
Finally, we return the prediction in the response body.
6. Endpoint for textual input
Another step in the moderation system is analyzing comment text. We have an analysis endpoint at "/analyze_text". This endpoint processes comment text as string and checks it for any issues.
Our function takes a comment as CommentText parameter - it could be something like 'sign up for free'.
We define a list of forbidden keywords we want to look for like "spam", "hate", "fake", and so on.
We turn the comment into lowercase using the .lower() method to ensure we catch keywords regardless of capitalization.
We then use a list comprehension to find matching forbidden words.
Finally, the endpoint returns "issues" with matching keywords, in this case, ["sign up", "free"] and a needs_moderation score which is nothing but the length of the issues list.
That's how we use basic text processing to handle text data. It could further be passed to a sentiment analysis model for more analysis.
7. Let's practice!
Time to practice handling different input types!