Agent Discovery & Design
Framework

A structured approach to designing AI agents that work in production. The right questions, asked in the right order, from business case to live operations.

5 stages

32 sections

Evidence-based gates

Talk to us

It needs to say so it's not just what we use to design agents, but it's also what we teach.

Explore the framework

WHY THIS IS DIFFERENT

AI agents operate differently than traditional software

AI agents bring powerful new capabilities to software systems. They can autonomously operate business complex processes, analyse and recommend, use IT systems, produce sophisticated deliverables and more. But AI agents also have features, behaviours, complexities and vulnerabilities that need to be understood and catered for during design. Some key examples are:

The agent sounds right when it's wrong. A fabricated answer sounds exactly like a correct one. There's no error message, no crash, no red flag. The first time you find out is when a real situation goes wrong.

It works until it doesn't. An agent that performs well today may degrade quietly over three months. The organisation changes, the data shifts, the context moves. Nothing visibly breaks.

Scale amplifies the gaps you already have. A person makes a bad call, it affects one case. An agent with the same gap repeats it across every case it handles, and there's no natural circuit breaker the way there is with a human team. And errors compound: an agent that's 95% accurate at each step is only 77% accurate across a five-step chain.

The most important failures are the ones you can't see. When an agent gets it obviously wrong (saying something absurd, crashing, being rude) someone notices and you fix it. The harder failures are the subtle ones. A clause misquoted slightly. Legal advice disguised as customer service. Each decision individually defensible, but systematically skewed. Standard monitoring won't catch these. They need a different approach.

Building it is the smaller problem. Most of the effort in successful AI adoption goes into people and processes, not the technology. Yet most agent projects invest almost entirely in the build phase. Who trains the team to work with probabilistic output? Who owns the agent when the build team moves on? Who decides when it's ready for more autonomy? We design for these questions from the start, not after launch.

19%

Only 19% of organisations have scaled AI agents beyond pilots. The rest stall before production.

Source: Databricks State of AI Agents, 2026

Serpin's Agent Discovery & Design Framework addresses all of these and more. The framework advances Agile development best-practices with more than 30 areas of AI agent-specific considerations covering design, architecture, testing, security, change management, governance and more.

AI FIT

AI has to earn its place

Code

Structured input, clear logic

"Is this invoice overdue?"

Code + AI

AI judges, code verifies

"Classify then validate"

AI agent

Unstructured input, clear decision logic

"Understand and respond"

Human

Unstructured input, unclear logic

"Is this genuinely unusual?"

Code

If the input is structured and the logic is clear, code is cheaper and more reliable. No AI needed. This is where most teams over-invest in AI when a simpler solution would do.

AI agent

If the input is messy but the decision logic is clear, that's where AI earns its place. In practice, most production agents combine AI interpretation with code verification.

Human

If both the input and the logic are unclear, you need a person. The first thing we do is work out which parts of your process sit where on this spectrum.

HOW IT WORKS

A structured framework in five stages

Serpin's Agent Discovery & Design Framework is a sequence of five stages, each building on the last. Each stage answers a different set of key questions and produces clear evidence and deliverables to inform the next stage. Decision gates between each stage allow stakeholders to ensure the evidence supports moving forward. Each gate builds confidence that you're designing the right solution in the right way to deliver the intended benefits and goals.

The Agent Design lifecycle

Stage 1

Validation

Is this the right use case?

Stage 2

Process & fit

What does the process look like?

Stage 3

Agent design

How does the agent behave?

Stage 4

Build spec

What exactly gets built?

Stage 5

Deployment & ops

How does it become operational?

Each gate builds confidence that what you're designing is worth designing

Stage 1

Is this the right use case?

This stage validates whether the use case is the right one to invest in. We look at what the problem actually costs today, what alternatives exist, where the business case holds, and whether you have examples of what good looks like. Without those, you have no way to know if the agent is working. Produces: a validated business case and a go/no-go decision.

3 sections

Problem & OpportunityLandscape & HypothesisBusiness Case

Stage 2

What does the process look like?

We map the process in detail. Not just what steps exist, but how each one works: who does it, what data flows in and out, whether it's a lookup or a judgement call. We surface unwritten rules and tacit knowledge, flag process improvements worth making before automation, and assess the risk and consequence of failure at each step. Produces: a detailed process map with risk assessment and AI fit decisions per step.

10 sections

Detailed Process MapWorkload AnalysisOpportunity MappingAI FitOutcomes & KPIsFailure ModesData LandscapeProcess BackboneDiscovery StoriesNon-Functional Requirements

Stage 3

How does the agent behave?

Every agent behaviour gets a testable specification. We design what the agent does, what it must not do, how much freedom it has with each tool, where humans stay in the loop, and how you prove it's working. Produces: behaviour specs, guardrail designs, human oversight levels, and a complete evaluation strategy.

9 sections

Agent ArchitectureBehaviour SpecsTools, Knowledge & ContextWorkflow DesignGuardrailsTrust DesignHuman OversightObservabilityEvaluation Strategy

Stage 4

What exactly gets built?

We translate everything into build contracts. System prompts, tool definitions, data schemas, guardrail enforcement. Precise enough that an engineer or coding agent can build without guessing at intent. Produces: build-ready contracts including system prompts, schemas, tool definitions, and guardrail rules.

6 sections

System PromptData SchemasTool ContractsKnowledge RetrievalMemory & ContextGuardrail Contracts

Stage 5

How does it become operational?

Most agent projects stall between "it works in testing" and "it works in the organisation." We define the phased deployment (shadow, pilot, production) with clear criteria for advancing between phases. We plan how people learn to trust probabilistic output, how roles change, and who owns the agent once the build team moves on. Produces: a deployment roadmap with phase gates, a stakeholder adoption plan, and a live operations model.

4 sections

Validation & IterationDeployment StrategyAdoption & ChangeLive Operations

OUR PRINCIPLES

Through experience, we identified agentic development principles that work

Agents decide, code enforces.

AI handles interpretation. Code handles verification. We separate these so the agent makes the call and code checks the result. This single principle prevents most production failures.

Build on the backbone, not the technology.

Before deciding anything about AI, we map the process backbone. The four to eight things that must happen to get from input to outcome, with no technology or roles attached. Only once you know what must happen can you decide how it happens and what does it.

Every behaviour is testable from the moment it's specified.

Every agent behaviour comes with a VERIFIED BY clause. If you can't write the verification, you haven't finished designing the behaviour.

The unwritten rules are the most dangerous gap.

When we map how a process works today, we're drawing out the knowledge that lives in people's heads. If the agent doesn't have this knowledge, it produces outputs that are technically correct but organisationally wrong.

Everything traces back to something real.

Every guardrail traces to a failure mode. Every agent behaviour traces to a required outcome. Nothing floats free. When something goes wrong, you trace it back to the design decision and fix that.

Design follows the agent into production.

Most frameworks stop at the build specification. We carry the design through validation, deployment, adoption, and live operations, because the design decisions that matter most are often the ones you make after the build is complete.

OUR PRINCIPLES

Through experience, we identified agentic development principles that work

Agents decide, code enforces.

AI handles interpretation. Code handles verification. We separate these so the agent makes the call and code checks the result. This single principle prevents most production failures.

Build on the backbone, not the technology.

Every behaviour is testable from the moment it's specified.

Every agent behaviour comes with a VERIFIED BY clause. If you can't write the verification, you haven't finished designing the behaviour.

The unwritten rules are the most dangerous gap.

Everything traces back to something real.

Every guardrail traces to a failure mode. Every agent behaviour traces to a required outcome. Nothing floats free. When something goes wrong, you trace it back to the design decision and fix that.

Design follows the agent into production.

OUR PRINCIPLES

Through experience, we identified agentic development principles that work

Agents decide, code enforces.

AI handles interpretation. Code handles verification. We separate these so the agent makes the call and code checks the result. This single principle prevents most production failures.

Build on the backbone, not the technology.

Every behaviour is testable from the moment it's specified.

Every agent behaviour comes with a VERIFIED BY clause. If you can't write the verification, you haven't finished designing the behaviour.

The unwritten rules are the most dangerous gap.

Everything traces back to something real.

Every guardrail traces to a failure mode. Every agent behaviour traces to a required outcome. Nothing floats free. When something goes wrong, you trace it back to the design decision and fix that.

Design follows the agent into production.

Human oversight

Four levels, assigned per action

Each designed behaviour gets its own level of human involvement, based on the risk and consequence of getting it wrong.

More autonomy

More oversight

Autonomous

Agent acts freely

Low consequence, high volume, verifiable

Supervised

Agent acts, human reviews after

Action taken, audit trail needed

Gated

Human must approve first

Higher risk, irreversible, or regulated

Human-only

Agent doesn't act at all

Outside agent scope entirely

Levels aren't fixed. An agent can start as Gated and earn its way to Autonomous as it proves itself in production.

The question isn't "does this need human oversight?" It's "which actions need which level, and why?"

IS THIS RIGHT FOR YOU

Who Agent Discovery & Design is for

Anyone designing an AI agent where getting it wrong has real consequences.

This is for you if…

✓

You're building an agent that needs to work reliably, not a demo

✓

You've tried building with AI and it worked in testing but failed in practice

✓

You've built an agent but haven't been able to embed it in your organisation

✓

You need a clear path from pilot to production, not just a working prototype

✓

Your agent makes decisions that affect people, money, or reputation

✓

You want a structured approach, not trial and error

This isn't for you if…

✗

You need a chatbot added to a website by Friday

✗

You're looking for a no-code automation tool

✗

The consequences of getting it wrong are low enough that a lighter approach would work

✗

You're experimenting to learn, not building for production

Ready to design your agent?

Whether you're building an agent or upskilling your team to design them, we'll show you where to start and what to focus on first. Thirty minutes.

Book a call