FRAMEWORK

Designing AI agents that work in production.

Most AI agent projects stall between pilot and production. The gap isn't the technology. It's the design work that comes before building.

4 parts

Evidence-based gates

Validated on 3 agent types

Agent Design is a four-part framework for designing AI agents that work in production. The right questions, asked in the right order, so you build with confidence.

WHY THIS IS DIFFERENT

Three things make AI agents different from anything you've built before

The agent sounds right when it's wrong. A fabricated answer sounds exactly like a correct one. There's no error message, no crash, no red flag. The first time you find out is when a real situation goes wrong.

It works until it doesn't. An agent that performs well today may degrade quietly over three months. The organisation changes, the data shifts, the context moves. Nothing visibly breaks.

Scale amplifies the gaps you already have. A person makes a bad call, it affects one case. An agent with the same gap repeats it across every case it handles, and there's no natural circuit breaker the way there is with a human team.

19%

Only 19% of organisations have scaled AI agents beyond pilots. The rest stall before production.

Source: Databricks State of AI Agents, 2026

AI FIT

AI has to earn its place

Code

Structured input, clear logic

"Is this invoice overdue?"

Code + AI

AI judges, code verifies

"Classify then validate"

AI agent

Unstructured input, clear decision logic

"Understand and respond"

Human

Unstructured input, unclear logic

"Is this genuinely unusual?"

Code

If the input is structured and the logic is clear, code is cheaper and more reliable. No AI needed. This is where most teams over-invest in AI when a simpler solution would do.

AI agent

If the input is messy but the decision logic is clear, that's where AI earns its place. In practice, most production agents combine AI judgement with code verification.

Human

If both the input and the logic are unclear, you need a person. The first thing we do is work out which parts of your process sit where on this spectrum.

HOW IT WORKS

Four parts, each building on the last

Agent Design is a sequence of four parts. Each one answers a different question and produces something concrete. Between each part there's a gate where the evidence needs to support moving forward. Each gate builds confidence that what you're designing is worth designing.

The Agent Design lifecycle

Part 1

Validation

Is this the right use case?

Part 2

Process & fit

What does the process look like?

Part 3

Agent design

How does the agent behave?

Part 4

Build spec

What exactly gets built?

Each gate builds confidence that what you're designing is worth designing

Part 1

Is this the right use case?

You already know you want to build an agent. This part validates whether this specific use case is the right one to invest in. We look at what the problem actually costs today, what alternatives exist, and where the business case holds.

Part 2

What does the process look like?

We map the process as it works today, including the unwritten rules and the tacit knowledge. We identify which steps genuinely need AI, which are better handled by code, and which need a human.

Part 3

How does the agent behave?

We design the architecture, guardrails, evaluation criteria, and human oversight per action. Every behaviour gets a testable specification. Every failure mode gets a designed response.

Part 4

What exactly gets built?

We translate everything into build contracts. System prompts, tool definitions, data schemas, guardrail enforcement. Precise enough that an engineer or coding agent can build without guessing at intent.

OUR PRINCIPLES

What makes this different

Agents decide, code enforces.

AI handles interpretation. Code handles verification. We separate these so the agent makes the call and code checks the result. This single principle prevents most production failures.

Build on the backbone, not the technology.

Before deciding anything about AI, we map the process backbone. The four to eight things that must happen to get from input to outcome, with no technology or roles attached. Only once you know what must happen can you decide how it happens and what does it.

Every behaviour is testable from the moment it's specified.

Every agent behaviour comes with a VERIFIED BY clause. If you can't write the verification, you haven't finished designing the behaviour.

The unwritten rules are the most dangerous gap.

When we map how a process works today, we're drawing out the knowledge that lives in people's heads. If the agent doesn't have this knowledge, it produces outputs that are technically correct but organisationally wrong.

Everything traces back to something real.

Every guardrail traces to a failure mode. Every agent behaviour traces to a required outcome. Nothing floats free. When something goes wrong, you trace it back to the design decision and fix that.

OUR PRINCIPLES

What makes this different

Agents decide, code enforces.

AI handles interpretation. Code handles verification. We separate these so the agent makes the call and code checks the result. This single principle prevents most production failures.

Build on the backbone, not the technology.

Before deciding anything about AI, we map the process backbone. The four to eight things that must happen to get from input to outcome, with no technology or roles attached. Only once you know what must happen can you decide how it happens and what does it.

Every behaviour is testable from the moment it's specified.

Every agent behaviour comes with a VERIFIED BY clause. If you can't write the verification, you haven't finished designing the behaviour.

The unwritten rules are the most dangerous gap.

When we map how a process works today, we're drawing out the knowledge that lives in people's heads. If the agent doesn't have this knowledge, it produces outputs that are technically correct but organisationally wrong.

Everything traces back to something real.

Every guardrail traces to a failure mode. Every agent behaviour traces to a required outcome. Nothing floats free. When something goes wrong, you trace it back to the design decision and fix that.

OUR PRINCIPLES

What makes this different

Agents decide, code enforces.

AI handles interpretation. Code handles verification. We separate these so the agent makes the call and code checks the result. This single principle prevents most production failures.

Build on the backbone, not the technology.

Before deciding anything about AI, we map the process backbone. The four to eight things that must happen to get from input to outcome, with no technology or roles attached. Only once you know what must happen can you decide how it happens and what does it.

Every behaviour is testable from the moment it's specified.

Every agent behaviour comes with a VERIFIED BY clause. If you can't write the verification, you haven't finished designing the behaviour.

The unwritten rules are the most dangerous gap.

When we map how a process works today, we're drawing out the knowledge that lives in people's heads. If the agent doesn't have this knowledge, it produces outputs that are technically correct but organisationally wrong.

Everything traces back to something real.

Every guardrail traces to a failure mode. Every agent behaviour traces to a required outcome. Nothing floats free. When something goes wrong, you trace it back to the design decision and fix that.

Human oversight

Four levels, assigned per action

Each designed behaviour gets its own level of human involvement, based on the risk and consequence of getting it wrong.

More autonomy

More oversight

Autonomous

Agent acts freely

Low consequence, high volume, verifiable

Supervised

Agent acts, human reviews after

Action taken, audit trail needed

Gated

Human must approve first

Higher risk, irreversible, or regulated

Human-only

Agent doesn't act at all

Outside agent scope entirely

The question isn't "does this need human oversight?" It's "which actions need which level, and why?"

IS THIS RIGHT FOR YOU

Who Agent Design is for

Anyone designing an AI agent where getting it wrong has real consequences.

This is for you if…

You're building an agent that needs to work reliably, not a demo

You've tried building with AI and it worked in testing but failed in practice

Your agent makes decisions that affect people, money, or reputation

You want a structured approach, not trial and error

This isn't for you if…

You need a chatbot added to a website by Friday

You're looking for a no-code automation tool

The stakes are low enough that getting it wrong doesn't matter

Ready to design your agent?