How We Designed a Zero-Fabrication Research Agent

We’ve all seen how LLMs can fabricate content. For many applications this is fine. In creative applications, it’s a benefit. But when accuracy is essential, fabrication is a real problem.

Research is a good example. AI readily produces citations, statistics and quotes that look plausible. But often the sources don't exist, or are on unrelated topics (that policy report link is actually a botany article). Asking the LLM to self-check doesn't help. It can confirm validation while still fabricating.

In many cases, accuracy is non-negotiable: legal research that informs client advice, consulting reports that shape decisions, regulatory filings and so on. Fabrications can trigger compliance failures, reputational damage or worse.
As professional services firms increasingly deploy AI agents to enhance productivity, these issues become more important.

We needed to build a research agent to gather and synthesise insights from the internet and private documents. We assumed fabrication will happen unless we actively prevent it, and built a multi-layered approach to ensuring we can't have fabricated data sources:

Code Gets the Data, Not the AI Agent
The agent tells code what it wants, and the code gets it. The agent never directly accesses the data, so it can only work with content that actually exists.

Indexed Selection
When an AI agent refers to a data source, it outputs only index numbers (e.g. "quote 2 from citation 3") while code extracts the actual content from the source. The agent chooses the options that best meet the brief, which LLMs are good at, but cannot generate the reference text itself.

Exact Quote Matching in Code
When the agent claims to quote a source, we verify those exact words appear in the fetched content using character-level matching. LLMs paraphrase frequently, including when purporting to quote precisely.

Real-time Feedback During Generation
When a quote fails validation, the agent receives immediate guidance: "This is failed attempt 2, you have 1 more attempt before failure." Research shows this external grounding improves the agent's ability to self-refine.

Flexible Targets Prevent Pressure to Fabricate
In early versions of our agent we required a minimum number of sources. When the agent struggled to find enough sources, it tried to invent some to fill the gap. Tuning the targets stopped it.

With this approach, fabrication dropped to zero. This worked for internet searches and for RAG integration. The benefit is complete audit trails and more trust - every claim traces back to a specific document, page, and paragraph. That delivers better productivity and simpler governance.

Every point where an LLM generates text is a trust boundary. Validation at one stage doesn't protect downstream. Each boundary needs its own controls.

For now, fabrication is a continual risk. Fortunately, it can be eliminated with the right techniques, with governance and auditability built-in. This greatly increases the value of any AI agent-based system where accurate, trustworthy output is needed.

We’ve all seen how LLMs can fabricate content. For many applications this is fine. In creative applications, it’s a benefit. But when accuracy is essential, fabrication is a real problem.

Research is a good example. AI readily produces citations, statistics and quotes that look plausible. But often the sources don't exist, or are on unrelated topics (that policy report link is actually a botany article). Asking the LLM to self-check doesn't help. It can confirm validation while still fabricating.

In many cases, accuracy is non-negotiable: legal research that informs client advice, consulting reports that shape decisions, regulatory filings and so on. Fabrications can trigger compliance failures, reputational damage or worse.
As professional services firms increasingly deploy AI agents to enhance productivity, these issues become more important.

We needed to build a research agent to gather and synthesise insights from the internet and private documents. We assumed fabrication will happen unless we actively prevent it, and built a multi-layered approach to ensuring we can't have fabricated data sources:

Code Gets the Data, Not the AI Agent
The agent tells code what it wants, and the code gets it. The agent never directly accesses the data, so it can only work with content that actually exists.

Indexed Selection
When an AI agent refers to a data source, it outputs only index numbers (e.g. "quote 2 from citation 3") while code extracts the actual content from the source. The agent chooses the options that best meet the brief, which LLMs are good at, but cannot generate the reference text itself.

Exact Quote Matching in Code
When the agent claims to quote a source, we verify those exact words appear in the fetched content using character-level matching. LLMs paraphrase frequently, including when purporting to quote precisely.

Real-time Feedback During Generation
When a quote fails validation, the agent receives immediate guidance: "This is failed attempt 2, you have 1 more attempt before failure." Research shows this external grounding improves the agent's ability to self-refine.

Flexible Targets Prevent Pressure to Fabricate
In early versions of our agent we required a minimum number of sources. When the agent struggled to find enough sources, it tried to invent some to fill the gap. Tuning the targets stopped it.

With this approach, fabrication dropped to zero. This worked for internet searches and for RAG integration. The benefit is complete audit trails and more trust - every claim traces back to a specific document, page, and paragraph. That delivers better productivity and simpler governance.

Every point where an LLM generates text is a trust boundary. Validation at one stage doesn't protect downstream. Each boundary needs its own controls.

For now, fabrication is a continual risk. Fortunately, it can be eliminated with the right techniques, with governance and auditability built-in. This greatly increases the value of any AI agent-based system where accurate, trustworthy output is needed.

Category

Insights

Insights

Insights

Written by

Scott Druck

Blog and Articles
Blog and Articles
Blog and Articles

Latest insights and trends

What next?

Let's have a conversation.

No pressure. No lengthy pitch deck. Just a straightforward discussion about where you are with AI and whether we can help.

If we're not the right fit, we'll tell you. If you're not ready, we'll say so. Better to find that out in a 30-minute call than after signing a contract.

Two male professionals collaborating during brainstorming session
What next?

Let's have a conversation.

No pressure. No lengthy pitch deck. Just a straightforward discussion about where you are with AI and whether we can help.

If we're not the right fit, we'll tell you. If you're not ready, we'll say so. Better to find that out in a 30-minute call than after signing a contract.

Two male professionals collaborating during brainstorming session
What next?

Let's have a conversation.

No pressure. No lengthy pitch deck. Just a straightforward discussion about where you are with AI and whether we can help.

If we're not the right fit, we'll tell you. If you're not ready, we'll say so. Better to find that out in a 30-minute call than after signing a contract.

Two male professionals collaborating during brainstorming session