1. Using pre-trained LLMs
As we know, LLMs can perform a variety of language tasks.
2. Language understanding
These include language understanding tasks such as text classification, sentiment analysis, summarization, question-and-answer, and more.
3. Language generation
We'll explore two language generation tasks now using pre-trained models, text generation and translation, to enhance our understanding of LLMs' capabilities and examine their underlying structure.
4. Text generation
Text generation involves producing coherent, meaningful, human-like text.
Here, we use a text generation pipeline to extend a user-provided prompt or input text about a tourist destination. The task is "text-generation" with the default model, although we can select a different text generation model from Hugging Face.
The max_length parameter limits the generated text length, and pad_token_id is set to generator.tokenizer.eos_token_id (short for end-of-sequence token ID).
5. Text generation
The pad_token_id parameter fills in extra space up to the specified max_length through padding, which adds extra tokens to make all sequences the same length, ensuring model efficiency. Setting this parameter to the tokenizer's end-of-sequence token ID, learned through training, marks the end of meaningful text. This setting helps the model recognize where to stop generating, ensuring it only produces text up to the specified length or the end-of-sequence token.
Another parameter, truncation=True, can be added if the input is longer than the maximum length we have set. We won't need it for this example.
6. Text generation
The output for this model is retrieved using the generated_text key.
Sometimes, the output may be suboptimal if the prompt lacks context. For instance, if the prompt is too vague or ambiguous, the generated text might not be relevant or coherent. If you recall, the original text talked about traditional housing, but the output is about trees here.
7. Guiding the output
We can control the output by being more specific in the prompt or including additional elements to guide the output.
Take this book review example. We've included a response element and combined the review and response into a single prompt using an f-string to guide the model.
8. Language translation
With translation, we also generate new text based on an input. This time, however, we generate the text in another language while preserving the original meaning.
The Hugging Face hub has a complete list of supported translation tasks and models. Let's look at an example of translating from English to Spanish.
As usual, we start by instantiating the pipeline. We want the task "translation_en_to_es" with the default model. We'll use the same input text about traditional Japanese houses and include the clean_up_tokenization_spaces argument for a more polished output, which we extract from the translation_text key.
9. Let's practice!
Let's practice generating.