Auto Models and Tokenizers

1. Auto Models and Tokenizers

Welcome back!

2. Pipelines: fast and simple

Pipelines are a fantastic way to quickly perform tasks like text classification or summarization, as we explored previously. But what if you need more control?

3. Auto Classes: flexible and powerful

Auto classes are a flexible way to load models, tokenizers, and other components without manual setup. They offer more control compared to pipelines, making them ideal for advanced tasks. While pipelines are great for quick experimentation, Auto classes let us customize every step.

4. AutoModels

Now that we’ve introduced Auto classes, let’s see how they simplify working with models. To directly download a model, we import the relevant AutoModel class for your task. For example, in text classification (also known as Sequence Classification), we use AutoModelForSequenceClassification. To load the model, use the .from_pretrained() method and specify the model name.

5. AutoTokenizers

To prepare text input for use in a model, we use tokenizers. It’s recommended to use the tokenizer paired with the model to ensure the input is processed exactly how it was during training. With pipelines, this tokenizer and model pairing happens automatically. However, with Auto classes, you handle this step yourself. To retrieve the tokenizer for our model, first import AutoTokenizer from transformers. Then, call the .from_pretrained() method and pass in the model name.

6. Tokenizing text with AutoTokenizer

Tokenizers work by first cleaning the input, such as lowercasing words or removing accents, and then dividing the text into smaller chunks called tokens. In this example, we load the tokenizer paired with our model using .from_pretrained(). Then, we call the .tokenize() method to split the sentence into tokens. The output shows how the text is processed and broken into smaller parts that the model can understand.

7. Different models, different tokenizers

Different models may handle tokenization differently. While our example tokenizer processes text in a specific way, other models may produce different outputs for the same input.

8. Building a Pipeline with Auto Classes

Now let’s create a custom pipeline. We start by importing the necessary modules, downloading a model and tokenizer, and combining them into a pipeline. By specifying the task, model, and tokenizer, we gain full control over the process for sentiment analysis.

9. Use Cases for AutoModels and AutoTokenizers

In a nutshell, we use AutoModels and AutoTokenizers instead of simple pipelines when tasks demand more control and customization. For example, advanced text preprocessing and tokenization enable cleaning tailored to specific use cases. In classification tasks, custom thresholding allows setting weights and thresholds to prioritize categories, like assigning 'Support' more often in a customer support model. Lastly, for complex text analysis workflows involving multiple processing stages, they provide precise control to customize and integrate each step effectively.

10. Let's practice!

We’ve unlocked the power of AutoModels—now, let’s dive into practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.