FRAMEWORK
Designing AI agents that work in production.
Most AI agent projects stall between pilot and production. The gap isn't the technology. It's the design work that comes before building.
4 parts
Evidence-based gates
Validated on 3 agent types
Agent Design is a four-part framework for designing AI agents that work in production. The right questions, asked in the right order, so you build with confidence.
Three things make AI agents different from anything you've built before
The agent sounds right when it's wrong. A fabricated answer sounds exactly like a correct one. There's no error message, no crash, no red flag. The first time you find out is when a real situation goes wrong.
It works until it doesn't. An agent that performs well today may degrade quietly over three months. The organisation changes, the data shifts, the context moves. Nothing visibly breaks.
Scale amplifies the gaps you already have. A person makes a bad call, it affects one case. An agent with the same gap repeats it across every case it handles, and there's no natural circuit breaker the way there is with a human team.
19%
Only 19% of organisations have scaled AI agents beyond pilots. The rest stall before production.
Source: Databricks State of AI Agents, 2026
AI has to earn its place
Code
Structured input, clear logic
"Is this invoice overdue?"
Code + AI
AI judges, code verifies
"Classify then validate"
AI agent
Unstructured input, clear decision logic
"Understand and respond"
Human
Unstructured input, unclear logic
"Is this genuinely unusual?"
Code
If the input is structured and the logic is clear, code is cheaper and more reliable. No AI needed. This is where most teams over-invest in AI when a simpler solution would do.
AI agent
If the input is messy but the decision logic is clear, that's where AI earns its place. In practice, most production agents combine AI judgement with code verification.
Human
If both the input and the logic are unclear, you need a person. The first thing we do is work out which parts of your process sit where on this spectrum.
Four parts, each building on the last
Agent Design is a sequence of four parts. Each one answers a different question and produces something concrete. Between each part there's a gate where the evidence needs to support moving forward. Each gate builds confidence that what you're designing is worth designing.
Part 1
Is this the right use case?
You already know you want to build an agent. This part validates whether this specific use case is the right one to invest in. We look at what the problem actually costs today, what alternatives exist, and where the business case holds.
Part 2
What does the process look like?
We map the process as it works today, including the unwritten rules and the tacit knowledge. We identify which steps genuinely need AI, which are better handled by code, and which need a human.
Part 3
How does the agent behave?
We design the architecture, guardrails, evaluation criteria, and human oversight per action. Every behaviour gets a testable specification. Every failure mode gets a designed response.
Part 4
What exactly gets built?
We translate everything into build contracts. System prompts, tool definitions, data schemas, guardrail enforcement. Precise enough that an engineer or coding agent can build without guessing at intent.
Who Agent Design is for
Anyone designing an AI agent where getting it wrong has real consequences.
This is for you if…
✓
You're building an agent that needs to work reliably, not a demo
✓
You've tried building with AI and it worked in testing but failed in practice
✓
Your agent makes decisions that affect people, money, or reputation
✓
You want a structured approach, not trial and error
This isn't for you if…
✗
You need a chatbot added to a website by Friday
✗
You're looking for a no-code automation tool
✗
The stakes are low enough that getting it wrong doesn't matter