Get startedGet started for free

Working with different model parameters

1. Working with different model parameters

Hello! Welcome to this video on working with different model parameters.

2. Models parameters in Amazon Bedrock

Let's explore controlling model outputs by adjusting key parameters. Different foundation models in Bedrock have their own parameter configurations for controlling model behavior. We'll focus on common parameters like temperature which controls randomness in predictions, top_p which controls diversity of model's output, and max_tokens which sets the maximum length of the output.

3. Temperature

Temperature controls response randomness and creativity. When you set a low temperature close to 0, the model produces focused and deterministic responses. This means it will consistently choose the most probable next words, leading to predictable outputs. Setting a high temperature, closer to 1, encourages the model to generate diverse and creative outputs. The model becomes more willing to select from a broader range of possible words. Most Bedrock models come with a default temperature of 0.7, providing a good balance between creativity and focus. In our code example, we're using a low temperature of 0.2 to generate a focused tech article headline where we want consistency and clarity rather than creative variation.

4. The range of temperature

Think of temperature as the model's 'risk appetite.' A low temperature is a cautious decision-maker, perfect for factual tasks like summarization and Q&A. A high temperature is a creative thinker, ideal for generating stories and brainstorming. It's like adjusting the model's personality from fact-checker to storyteller based on your needs.

5. Top_p

Top_p, or nucleus sampling, controls output predictability by determining the probability of words used to produce responses - from 0.1 for focused responses to 0.9 for diverse ones. For example, a top_p of 0.1 means the words making up the top 10% of the probability mass are considered for completions. Our code shows both in action: a focused response uses low top_p, while diverse responses use higher values for top_p.

6. Max tokens

Max_tokens simply limits response length, helping manage costs and performance. You'll typically use values between 100 and 2000 tokens. Our code shows both in action: the focused shorter response uses low top_p with fewer tokens, while the creative longer response uses higher values for both parameters.

7. Parameter selection

Let's look at how to choose the right parameters for different Bedrock use cases. For content generation, use a higher temperature between 0.7 and 0.9, while Q&A systems work better with lower temperatures of 0.1 to 0.3. Documentation needs a lower top_p around 0.1 to 0.3, but for brainstorming, increase it to 0.7-0.9. As for max_tokens, chat applications typically need 150 to 300 tokens, while long-form content requires 1000 or more.

8. Let's practice!

Let's carry on towards some fun practice exercises to experiment with these parameters!