Agent Final Answer Validation

1. Agent Final Answer Validation

Welcome to the final lesson in the course!

2. Why Validation Matters

In the previous lesson, you learned how to debug your agents after something goes wrong. But debugging is only part of managing agents in production. Imagine you run a car dealership and use an agent to help customers shop online. A user asks, "Please give me recommendations for a family-friendly car for someone who also has pets." But instead of giving specific suggestions, the agent replies vaguely: "You could look into cars that are roomy and comfortable." This kind of answer is not very helpful. The customer still doesn't know which cars to consider, and they may decide to leave. To avoid this, smolagents lets you validate final answers before they're shown. Let's explore how.

3. Validating Agent Responses

Let's define a validation function, starting with a simple rule: make sure the agent gives a substantial response of at least 200 characters. This function accepts two arguments: the agent's final answer and its memory. If the final answer fails the rule, it raises an exception with a clear message. Otherwise, it returns True, meaning the final answer is valid.

4. Using Output Validation in Your Agent

Now, we can attach the validation function to an agent using the final_answer_checks parameter. So before responding, the agent runs the check_answer_length() validation. If it fails, it retries automatically with a better answer using the exception message as feedback.

5. Meta-Evaluation: Using AI to Validate AI

But what if you need more advanced checks, like verifying the logic behind the answer? This is where the meta-evaluation approach becomes useful. This technique uses a separate LLM to check the agent's reasoning. Instead of writing many Python functions, you describe your business rules as prompts. The validation prompt in the example takes a reasoning process along with the agent's final answer. It returns TRUE if the agent's reasoning and final answer correctly solve the user's original question, and FALSE otherwise.

6. Validating Reasoning with a Meta-Evaluator

Now, you can wrap the validation prompt into a validation function. We create an evaluator_model as our evaluating LLM. Then, we get the agent's reasoning steps using .get_succinct_steps(). We take the agent's reasoning steps and final answer, and insert them into the validation_prompt. Next, we create a ChatMessage with this prompt and set its role to 'user'. Lastly, we send this message to the evaluator model. If the response is FALSE, we know there's a logical problem and raise an exception. Otherwise, validation works.

7. Combining Multiple Validations

You can combine several validation strategies in the agent's final_answer_checks for better reliability. For example, here we are adding the two functions we created before, check_answer_length() and check_reasoning_accuracy(). With both checks active, the agent is more likely to catch and correct errors before the user ever sees them.

8. Designing Intelligent Systems

You've now built intelligent agents, debugged their behavior, and added safeguards to catch errors before they reach users. But more importantly, you've learned to think like a system designer. When outputs go wrong, you can trace them back to memory, prompts, or tools. When reasoning breaks, you know how to inspect and improve it. And when your system grows, you can scale quality control using agents to supervise other agents. The truth is, no agent is perfect. But with validation, memory, and smart debugging in place, you can build agents that learn, adapt, and improve over time.

9. Let's practice!

Now let's practice these advanced concepts!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.