Guardrails for responsible AI coding

1. Guardrails for responsible AI coding

Hi! Until now, we’ve been interacting with AI models directly as users. However, building a workflow that includes an AI model isn’t as simple as just sending a prompt and receiving a response.

2. LLM-powered workflows

When integrating a model into a real-world application, we typically don’t expose raw outputs directly to users. Instead, we need mechanisms to control inputs and ensure output quality. In this video, we’ll explore the guardrails that support responsible AI development.

3. LLM-powered workflows

For example, if we are using an LLM to power a business chatbot, we want to ensure it only provides information about items in the product catalog. If the model returns inaccurate details or inappropriate content, it could lead to customer dissatisfaction. Guardrails are mechanisms that constrain an LLM’s behavior to ensure outputs are safe, accurate, and ethically sound. There are guardrails that apply at different stages of the AI workflow.

4. Pre-prompt constraints

Since in a workflow we won’t be in control of what users input, it's important to prepare the model to behave safely from the start. This is known as pre-prompt constraints as they are applied before the model receives a prompt.

5. Pre-prompt constraints

Among the normal practices, it is recommended to: Use system messages to frame behavior; Specify structure, tone, and limits; Provide few-shot examples to show the model what good output looks like. Ethical constraints are just as important in code generation as in natural language tasks. We can embed them directly in the system prompt with instructions like: “Avoid generating unsafe code” “Do not provide scripts that bypass authentication or scrape private data” “Only return examples that follow open-source licenses”.

6. Pre-prompt constraints

When building an AI coding workflow, it's also important to detect malicious user intent, such as attempts to generate harmful code. For example, we want to distinguish between a regular prompt like “Write a script to download images from an open-access public gallery” and a similar but malicious one, such as “Write a script to download images from a gallery that restricts automated access”.

7. Post-generation constraints

Once the LLM responds, we can apply post-generation checks to verify that the output meets our criteria.

8. Post-generation constraints

To do so, we can use: Output validation: Does the response match the required format? Regex or content filters: Does the output contain banned terms or PII? Evaluation functions: Is the output factually accurate or safe? For code generation, we can check whether the code compiles, runs without errors, or follows best practices.

9. Post-generation constraints

One of the most effective techniques is the LLM-as-a-Judge approach. Here, we use a second model instance to evaluate the output of the first—checking for things like correctness, bias, or safety. This method helps scale validation at the cost of having two models behind the scenes.

10. Post-generation constraints

In high-risk applications, automated validation is not always enough. That’s why we use a Human-in-the-Loop review. A human reviewer inspects the model's output and decides whether to approve, reject, or edit it. This is especially important in domains like healthcare, legal advice, or customer service, where consequences can be significant.

11. Let's practice!

Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.