Hugging Face pipelines for sentiment analysis

1. Hugging Face pipelines for sentiment analysis

Time to learn about Hugging Face pipelines for text classification, and the start is with sentiment analysis!

2. Recap: NLP workflow

In previous chapters, we covered the steps to prepare text for modeling, which was important for understanding how models interpret text. However, repeating this for every task can be slow and error-prone when we want to scale.

3. Hugging Face pipelines

Hugging Face pipelines simplify this process. A pipeline is a ready-made workflow that handles preprocessing, feature extraction, and modeling in one function call. Defining a pipeline requires an NLP task and a model to perform it.

4. Pipelines for sentiment analysis

Today, we'll use pipelines for sentiment analysis, a text classification task that predicts whether a text expresses a positive or negative emotion.

5. Models for text classification

To find suitable models for this task, we can visit huggingface.co/models, scroll down, and select the "Text Classification" category. This will display a list of models that can potentially be used for sentiment analysis. However, it's important to review the model's documentation to confirm whether it's intended for sentiment analysis or general text classification, which we'll learn about in the next video.

6. Pipelines in code

To create pipelines in code, we start by importing pipeline from the transformers library. We define a classification_pipeline, specifying the task as sentiment-analysis, or text-classification, and a suitable model for this task. To classify the sentiment of a given movie review, we pass it to the classification_pipeline. The result includes a label (POSITIVE or NEGATIVE) and a confidence score. Output formats vary by model. Some may include additional labels like NEUTRAL, and others may have different label names. Therefore, we should refer to the model's documentation for proper interpretation.

7. Sentiment analysis on a batch of texts

We can expand this to a list of various customer reviews that we pass to the classification_pipeline. Each result shows the predicted sentiment and its confidence. Keep in mind that no model is perfect! For example, the fourth sentence is sarcastic and clearly expresses a negative sentiment, yet the model incorrectly predicted it as positive.

8. Assessing sentiment analysis models

One way to assess how well a model performs before using it in a real-world setting is to test it on a batch of texts for which we already know the classifications, also known as true_labels. We feed these texts into the classification_pipeline and compute the predicted_labels using a list comprehension. We evaluate the model's performance by calculating the accuracy_score, which we import from sklearn.metrics. This metric compares the true and predicted_labels to determine how often the model made correct predictions. Here, the model achieved 80% accuracy. We might decide to evaluate other models on the same dataset, and choose the one with better performance for future use.

9. Let's practice!

See how pipelines simplify our work by combining everything we learned in previous chapters to extract meaning from text? Let's practice using them!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Natural Language Processing (NLP) in Python

IntermediateSkill Level

4.8+

268 reviews