Mastering Response Parameters

1. Mastering Response Parameters

Nice job with your first responses request! In this lesson, we'll look at additional parameters you can send in your Responses API requests to optimize for quality, behavior, and cost.

2. Model Selection

Let's start with the model parameter. The Responses API allows you to access many different OpenAI models. Here, we used gpt-5-mini, but there are others depending on the speed, reasoning, and cost requirements of the use case.

3. Model Selection

These are the three models available for you to use in this course. From top-to-bottom, reasoning ability and response quality will increase at the expense of speed and cost. The best choice is often the cheapest model that meets our speed and quality needs. Because these models have reasoning capabilities, let's talk more about what's happening under the hood.

4. LLMs and Tokens

LLMs process text, both on input and output, as tokens. Tokens are a unit of one or more characters used by language models to understand and interpret text. Here, we can see how one model would encode this sentence as tokens to process it on input.

5. LLMs and Tokens

When non-reasoning LLMs respond to this input, it will simply generate the tokens it deems to have the highest likelihood of following the prompt. This method works well for simple question-and-answer tasks, and for many content creation tasks, but for more complex tasks like software development, research, and analysis, the quality of these outputs may be insufficient. This is where reasoning models come in.

6. LLMs and Tokens

Reasoning models generate intermediate reasoning tokens, sometimes called thinking token, which give the model additional space to break down a complex problem statement or task into smaller problems or sub-tasks. This greatly improves their accuracy and quality, but requires more time and cost to generate these intermediate reasoning tokens. For reasoning models, we use two parameters to control this behavior: max_output_tokens and reasoning.

7. Reasoning Effort

The reasoning parameter takes a dictionary with an "effort" key, which tunes the model on the amount of reasoning, or deep thinking, we expect it to do when responding to our prompt.

8. Reasoning Effort

The options here include "minimal", "low", "medium", and "high". For relatively simple tasks, it's often worth reducing the reasoning effort so that the model doesn't reason for longer than it needs to. Remember, those reasoning tokens still cost!

9. Reasoning Summaries

Although we can't access the reasoning tokens directly, we can request a summary the reasoning the model used with the "summary" key.

10. Limiting Output Tokens

To set an upper limit on the number of output tokens, we can set the max_output_tokens parameter. Importantly, this number of output tokens also includes the reasoning tokens, so the limit we set here has to account for the amount of reasoning we expect. For higher reasoning effort, we'll need to increase any max_output_tokens restrictions to allow the model to reason effectively and complete the task. Let's briefly summarize the three parameters we've looked at.

11. Summary

model, reasoning, and max_output_tokens are all ways we can control model response quality, speed, and cost, and they're all related. For simpler tasks, it's worth starting with smaller models, less reasoning effort, and smaller token restrictions. Then, you can relax these parameters until you find the optimum for your application. For more complex tasks, start somewhere in the middle in terms of model size and reasoning effort, and experiment until you find the right combination. Then, set a max_output_tokens value suitable for the task.

12. Let's practice!

Time to experiment with these parameters in the exercises!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.