TRAINING COURSE

Testing & Evaluating AI Agents

AI agents fail differently from traditional software. This programme equips your team with the evaluation frameworks, failure pattern recognition, and operational disciplines to ensure agents perform reliably.

1 Day

8-20 Participants

Online or in-person

No Prerequisites

For product managers, technology leaders, quality and compliance professionals, and technical staff managing or building AI agent systems.

WHY THIS MATTERS

The Reliability Gap

AI agents produce outputs that look entirely reasonable while being subtly wrong — no error messages, no crashes, just confident responses containing errors

Most organisations deploying AI agents have no systematic way to know whether those agents are producing reliable work

The capability that makes AI agents valuable is also what makes them risky — and failures only surface after damage is done

40%

of agentic AI projects will be scrapped by 2027 due to escalating costs and unclear business value

Source: Gartner, June 2025

IS THIS RIGHT FOR YOU

Who Should Attend

Product managers, technology leaders, quality and compliance professionals, and technical staff (beginner to intermediate) managing, commissioning, or building AI agent systems.

This is for you if…

You're deploying or commissioning AI agents and need to know they're producing reliable work

You want to understand how AI agents fail and how to catch problems before they reach clients

You need to build evaluation processes that improve both the agent and the evaluations over time

You want practical frameworks for continuous monitoring and production operations, not just pre-launch testing

This isn't for you if…

You're looking for deep machine learning or model training skills (this is about evaluating agent outputs, not building models)

Your organisation has no current or planned AI agent deployments

You want self-paced e-learning rather than live facilitation with a running case study

AFTER THIS TRAINING

What You'll Be Able To Do

01

Understand

What AI agents are, how they work, and why they require a fundamentally different approach to quality assurance

02

Classify

Agent outputs and match each type to the right quality check — evaluation, not just testing

03

Recognise

The specific ways AI agents fail, including failures that are invisible until they cause damage

04

Design

Layered quality and security checks that catch problems before they reach clients

05

Build

Evaluation processes that improve both the agent and the evaluations over time

06

Instrument

Agents for continuous monitoring in production, detect drift, and respond before quality degrades

07

Assess

Your organisation's evaluation maturity and build a concrete action plan for what to build next

08

Diagnose

Whether quality issues are specification problems or generalisation problems — and fix the right thing first

AFTER THIS TRAINING

What You'll Be Able To Do

01

Understand

What AI agents are, how they work, and why they require a fundamentally different approach to quality assurance

02

Classify

Agent outputs and match each type to the right quality check — evaluation, not just testing

03

Recognise

The specific ways AI agents fail, including failures that are invisible until they cause damage

04

Design

Layered quality and security checks that catch problems before they reach clients

05

Build

Evaluation processes that improve both the agent and the evaluations over time

06

Instrument

Agents for continuous monitoring in production, detect drift, and respond before quality degrades

07

Assess

Your organisation's evaluation maturity and build a concrete action plan for what to build next

08

Diagnose

Whether quality issues are specification problems or generalisation problems — and fix the right thing first

AFTER THIS TRAINING

What You'll Be Able To Do

01

Understand

What AI agents are, how they work, and why they require a fundamentally different approach to quality assurance

02

Classify

Agent outputs and match each type to the right quality check — evaluation, not just testing

03

Recognise

The specific ways AI agents fail, including failures that are invisible until they cause damage

04

Design

Layered quality and security checks that catch problems before they reach clients

05

Build

Evaluation processes that improve both the agent and the evaluations over time

06

Instrument

Agents for continuous monitoring in production, detect drift, and respond before quality degrades

07

Assess

Your organisation's evaluation maturity and build a concrete action plan for what to build next

08

Diagnose

Whether quality issues are specification problems or generalisation problems — and fix the right thing first

THE CURRICULUM

Programme Overview

7 modules. 1 day. Real capability.

1
Understanding AI Agents50 minutes — What agents are and why they need evaluation

Build the foundations: what AI agents are, how they differ from chatbots and copilots, and why they require a fundamentally different approach to quality assurance.

What AI agents are and how they differ from chatbots, copilots, and traditional automation
Real-world use cases: contract review, due diligence, client onboarding, regulatory compliance
How AI agents work: LLMs, non-determinism, and fabrication tendencies
Agent architecture: tools, memory, reasoning, and orchestration
The reliability gap: why an agent that can do something is not the same as one that will do it consistently
Key platforms, frameworks, and the current adoption landscape
Exercise"Case Study Introduction" — meet the professional services agent you will evaluate throughout the day

Full detailed agenda available on request

THE EXPERIENCE

How the Day Works

This is live, facilitated training built around a realistic professional services case study that runs through every module. Your team learns by applying evaluation frameworks to real scenarios, not sitting through presentations.

~60%

Frameworks & Real-World Examples

Evaluation concepts, failure patterns, and quality check design explained clearly with real-world examples from professional and financial services.

~40%

Practical Exercises

Hands-on exercises building on a professional services case study: classifying outputs, mapping failure patterns, designing quality checks, conducting error analysis, and building evaluation criteria.

1 Case Study

Running Throughout

A professional services case study evolves through every module. You'll classify its outputs, map its failure points, design its quality checks, and build its monitoring plan — so you leave with a complete, worked example.

Available online (via Zoom/Teams) or in-person at your location

All materials, frameworks, and reference guides included

No prior technical knowledge required — Module 1 establishes the necessary foundations

Groups of 8–20 for meaningful discussion and peer learning

WHY SERPIN

What Makes This Different

Evaluation, Not Just Testing

Goes beyond traditional pass/fail testing to teach the evaluation discipline that AI agents actually require: measuring quality across exact and judgment outputs, calibrating AI judges, and building layered quality architectures.

Case Study Throughout

Not abstract theory — a professional services agent case study runs through every module. You'll classify its outputs, map its failures, design its checks, and build its monitoring plan. You leave with a complete, worked example.

Actionable Roadmap

You won't just learn frameworks — you'll build a 30/60/90-day evaluation action plan for your organisation. Assess your current maturity, identify what to build next, and leave with specific next steps, not generic recommendations.

COMMON QUESTIONS

Frequently Asked Questions

Do I need technical AI knowledge to attend?

How is this different from traditional software testing training?

What if we haven't deployed AI agents yet?

What do participants actually take away?

Can you tailor this to our organisation?

COMMON QUESTIONS

Frequently Asked Questions

Do I need technical AI knowledge to attend?

How is this different from traditional software testing training?

What if we haven't deployed AI agents yet?

What do participants actually take away?

Can you tailor this to our organisation?

COMMON QUESTIONS

Frequently Asked Questions

Do I need technical AI knowledge to attend?

How is this different from traditional software testing training?

What if we haven't deployed AI agents yet?

What do participants actually take away?

Can you tailor this to our organisation?

Ready to ensure your AI agents perform reliably?

Have questions? Email training@serpin.ai