Understanding the transformer

1. Understanding the transformer

We've practiced various LLM tasks; now it's time to explore how they are built.

2. What is a transformer?

Transformers are deep learning architectures for processing, understanding, and generating human-like text. Most of today's LLMs use this architecture because it can handle longer text sequences in parallel rather than processing each word sequentially. The three common architectures are encoder-only, decoder-only, and encoder-decoder architectures. The complete structural details are not shown since that is out of scope for this course.

3. Transformer architectures

Each architecture specializes in specific tasks, often detailed in the model card on Hugging Face. Sometimes, this information is missing, or we may be looking for more detail in some advanced cases. We'll see how to investigate and identify a model's structure.

4. Encoder-only

The encoder-only architecture focuses on encoding and understanding input text without producing a sequential output, such as a sentence. It is commonly used for text classification, sentiment analysis, and extractive question-answering, where the output is an extract of text or a label. BERT-based models tend to use this structure.

5. Encoder-only

To identify the model architecture, print the model structure using llm.model or llm.model.config after loading it with the pipeline. We'll look for indicators like "encoder", such as the one seen for this BERT model, or an "architecture" or "task" element that will help us identify the architecture. Outputs are shortened in this video for brevity.

6. Encoder-only

Specifically, we can check llm.model.config.is_decoder or llm.model.config.is_encoder_decoder. Checking llm.model.config.is_decoder may give the answer directly, although this isn't always explicitly set.

7. Decoder-only

A decoder-only architecture focuses on the output, making it ideal for generative tasks like text generation and generative question-answering, where the answer is a sentence or paragraph. GPT-based models tend to use this architecture.

8. Decoder-only

We identify the structure in the same way. This decoder GPT model is more tricky. We have to rely on our knowledge of how the structure is typically used; this "text-generation" parameter means it is likely a decoder-only structure. For a GPT model like this, even if llm.model.config.is_decoder returns false, examining the structure and usage can indicate it's a decoder-only model.

9. Encoder-decoder

Finally, the encoder-decoder architecture combines the two, helping the model understand and process both the input and output for tasks like language translation and text summarization. This is typically found in T5 and BART models.

10. Encoder-decoder

Examining the output of llm.model shows us "decoder" and "encoder" elements.

11. Encoder-decoder

The llm.model.config.is_encoder_decoder attribute is more commonly set and helps identify an encoder-decoder structure by returning true.

12. Let's practice!

Have a go at identifying and selecting a suitable model based on its structure.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.