AI agent guardrails - a quick guide



Guardrails are a critical part of any AI agent development
As organisations increasingly adopt AI agents, guardrails are becoming more critical. There are two main types of guardrail, and the one that matters most is the one that gets missed.
Input guardrails check what users ask before the AI processes it. Blocking inappropriate requests, keeping conversations on topic, filtering out attempts to manipulate the system. Most organisations cater for these.
Output guardrails check what the AI produces before showing it to users. This is the area that gets forgotten. And it's where the real risk actually sits. If you like analogies, think of a restaurant. Input guardrails are the menu, setting boundaries on what customers can order. Output guardrails are the chef checking the plate before it leaves the kitchen. You'd never send food out without looking at it first.
In 2024, Air Canada's chatbot invented a refund policy and the company tried to argue it was a "separate legal entity" responsible for its own actions. The tribunal rejected this. Since then, OpenAI has faced multiple lawsuits after allegedly relaxing safeguards. Character.ai banned open-ended chats for under-18s. And we've recently seen several firms in trouble over allowing users to 'nudify' photos.
Several governments are now demanding output controls from major AI companies to protect against harmful content. If you're commissioning AI agents, ask whoever builds them: what happens between the AI generating a response and the user seeing it? That gap is where output guardrails live.
Guardrails are a critical part of any AI agent development
As organisations increasingly adopt AI agents, guardrails are becoming more critical. There are two main types of guardrail, and the one that matters most is the one that gets missed.
Input guardrails check what users ask before the AI processes it. Blocking inappropriate requests, keeping conversations on topic, filtering out attempts to manipulate the system. Most organisations cater for these.
Output guardrails check what the AI produces before showing it to users. This is the area that gets forgotten. And it's where the real risk actually sits. If you like analogies, think of a restaurant. Input guardrails are the menu, setting boundaries on what customers can order. Output guardrails are the chef checking the plate before it leaves the kitchen. You'd never send food out without looking at it first.
In 2024, Air Canada's chatbot invented a refund policy and the company tried to argue it was a "separate legal entity" responsible for its own actions. The tribunal rejected this. Since then, OpenAI has faced multiple lawsuits after allegedly relaxing safeguards. Character.ai banned open-ended chats for under-18s. And we've recently seen several firms in trouble over allowing users to 'nudify' photos.
Several governments are now demanding output controls from major AI companies to protect against harmful content. If you're commissioning AI agents, ask whoever builds them: what happens between the AI generating a response and the user seeing it? That gap is where output guardrails live.
Category
Insights
Insights
Insights
Written by

Julia Druck
Latest insights and trends
Let's have a conversation.
No pressure. No lengthy pitch deck. Just a straightforward discussion about where you are with AI and whether we can help.
If we're not the right fit, we'll tell you. If you're not ready, we'll say so. Better to find that out in a 30-minute call than after signing a contract.

Let's have a conversation.
No pressure. No lengthy pitch deck. Just a straightforward discussion about where you are with AI and whether we can help.
If we're not the right fit, we'll tell you. If you're not ready, we'll say so. Better to find that out in a 30-minute call than after signing a contract.

Let's have a conversation.
No pressure. No lengthy pitch deck. Just a straightforward discussion about where you are with AI and whether we can help.
If we're not the right fit, we'll tell you. If you're not ready, we'll say so. Better to find that out in a 30-minute call than after signing a contract.





