Before Building Your First AI Agent: What Nobody Tells You About Data

There is a pattern that keeps repeating itself across many organisations right now. An operations team identifies a specific problem that seems perfect for an AI agent: incident triage, automatic escalation, resolution of recurring alerts. The team has genuine domain expertise, motivation, and access to modern tools. They start building. And within a few weeks, the same obstacle appears. Always the same one: the data isn’t where it should be, it doesn’t mean what it appears to mean, or it can’t be trusted enough to allow an AI to make decisions based on it.

It’s not a technology problem. It’s a data foundations problem.

This article isn’t about how to build AI agents. It’s about what needs to be in place before you start, so that when the agent reaches production, it doesn’t become a source of incorrect decisions that are difficult to trace.

The profile adopting agents the fastest right now

Operations teams are probably the group that can extract the most value from AI agents in the short term. They understand their processes in depth, they have an endless list of repetitive tasks that consume time, and they are able to define precisely what a good decision looks like versus a bad one.

But precisely because of that, they are also the most exposed to risk: they build on top of operational data that was never designed to be consumed by automated systems. Logs from legacy systems, records stored in inconsistent formats, statuses that only make sense if you understand the historical context of the process. Data that an experienced operator interprets correctly because they have spent years working with it. Data that an agent will interpret literally, without that context.

The most common outcome isn’t a visible catastrophic failure. It’s something more subtle and more dangerous: an agent that works well 80% of the time and fails silently during the remaining 20%. And without the right mechanisms in place, nobody knows until the issue has already escalated.

What Does It Mean to Have AI-Ready Data?

When we talk about preparing data for an AI agent, we’re not talking about having a perfectly organised data lake or having completed a three-year data strategy. We’re talking about answering four specific questions before writing the first line of code for the agent:

Where does the data the agent will use come from?

It sounds obvious, but in practice many teams start building on data sources they don’t fully control: third-party system APIs, manual exports, operational databases with no availability SLAs. If the agent depends on a source that may become unavailable or change format without warning, the agent will inherit that fragility.

What does each field mean, and who decides that?

Operational data is full of implicit conventions. A field called “status” may have twelve possible values, three of which are technically equivalent but are used in different systems for historical reasons. Without at least a basic data dictionary, the agent will learn those inconsistencies and reproduce them in its decisions.

Who can access what, and under what conditions?

AI agents are not human users, but they access data with exactly the same privacy and compliance implications. In regulated sectors such as airlines, banking, insurance or healthcare, this isn’t optional. An agent accessing passenger information to resolve operational incidents requires exactly the same level of access control as any production system. If this isn’t defined from the outset, it becomes technical and regulatory debt that slows deployment.

How will we know if the agent is making decisions based on incorrect data?

This is the question the fewest teams ask before they start, and the most important one. Observability for an agent isn’t just about monitoring whether it responds or not. It’s about being able to trace, for any decision it makes, what data it consulted, when it consulted it, and what values it used. Without that, debugging incorrect behaviour becomes almost impossible.

Data Governance Isn’t Bureaucracy. It’s What Makes Scale Possible.

There is a widespread perception among business teams that data governance is an IT initiative that slows things down. In the context of AI agents, that perception is particularly costly.

An agent in production makes decisions continuously and autonomously. Every incorrect decision carries a cost: operational, financial, or in terms of user trust. Unlike a manual process, where an operator can correct issues as they arise, an agent scales errors just as quickly as it scales success.

This doesn’t mean organisations need to complete a two-year data governance programme before launching their first agent. It means identifying, for the specific use case, which data is critical, who owns it, and what level of quality is acceptable. It’s an exercise that can be completed in days if the scope is well defined.

The difference between AI initiatives that reach production and those that remain trapped in perpetual pilot mode rarely lies in the model or the platform. It lies in whether the team did this preparatory work on the data.

What should a CIO or CDO do before authorising development?

If you’re in a decision-making position regarding whether to launch an AI agent initiative within your organisation, there are three concrete actions that make the difference between a pilot that scales and one that doesn’t:

Identify the owner of every data source the agent will consume. Not the technical owner of the system, but the business stakeholder who can answer whether the data is accurate and complete. If that role doesn’t exist, create it before you begin.
Define the minimum data contract. For the specific use case, determine which fields are essential, which values are valid, and what level of latency is acceptable. You don’t need a complete enterprise data catalogue; you need to know exactly what the agent requires to operate correctly.
Establish a human review mechanism for high-impact decisions. Agents don’t have to be fully autonomous from day one. A design that keeps the operator in the loop for exceptions is more robust, easier to audit, and generates greater organisational trust. Autonomy should expand as the system demonstrates reliability, not the other way around.

Building it right from the start

The adoption of AI agents in operations is a genuine opportunity for organisations with teams that possess deep domain expertise and clearly defined problems. But that opportunity only materialises if the data foundations are in place.

This isn’t about perfection. It’s about not building on sand. An agent operating on well-governed data, with traceability and access controls clearly defined, doesn’t just perform better: it’s an agent the organisation can trust. And in critical operations, trust in the system is just as important as its accuracy.

The teams that begin this journey with their data in order, ownership clearly assigned, and observability designed from the outset are the ones that, six months later, have agents in production delivering real work. The ones that don’t still have pilots that remain just that: pilots.

Sergio Cambelo

+ posts

Cloud Architect at Keepler. "Knowledge is the foundation of what we do everyday to help our customers to achieve their goals. I like to design complex cloud architectures but what I enjoy the most is learning and sharing the knowledge with others to bring value to Keepler and their customers."

0 Comments

Data Management Becomes Semantic and Cognitive

Feb 3, 2026

2025 ended with a paradox that organisations can no longer afford to overlook: never have we seen so many AI initiatives deployed, yet never has the gap between adoption and real business value been so clear. According to McKinsey’s latest report, The State of AI in...

Not more AI — orchestration. The era of the Agentic Mesh.

Jan 29, 2026

In 2026, “having agents” is no longer an advantage. The advantage is Agentic Mesh. During 2024 and 2025, many companies did what was expected: they tested AI through “pilots”. A copilot for the sales team. A bot for support. An assistant for finance. Promising...

The Gap Between Strategy and Execution in the Age of AI

Dec 17, 2025

As we approach 2026, artificial intelligence is no longer a shiny novelty or a laboratory experiment; it has become a business imperative. Yet, when analysing the European corporate landscape, we find a paradox: ambition is high, but scalable execution remains the...

Before Building Your First AI Agent: What Nobody Tells You About Data

The profile adopting agents the fastest right now

What Does It Mean to Have AI-Ready Data?

Where does the data the agent will use come from?

What does each field mean, and who decides that?

Who can access what, and under what conditions?

How will we know if the agent is making decisions based on incorrect data?

Data Governance Isn’t Bureaucracy. It’s What Makes Scale Possible.

What should a CIO or CDO do before authorising development?

Building it right from the start

Sergio Cambelo

0 Comments

Leave a ReplyCancel reply

Sergio Cambelo

June 16, 2026

AI

Categories

Archive

You May Also Like

Data Management Becomes Semantic and Cognitive

Not more AI — orchestration. The era of the Agentic Mesh.

The Gap Between Strategy and Execution in the Age of AI

Before Building Your First AI Agent: What Nobody Tells You About Data

The profile adopting agents the fastest right now

What Does It Mean to Have AI-Ready Data?

Where does the data the agent will use come from?

What does each field mean, and who decides that?

Who can access what, and under what conditions?

How will we know if the agent is making decisions based on incorrect data?

Data Governance Isn’t Bureaucracy. It’s What Makes Scale Possible.

What should a CIO or CDO do before authorising development?

Building it right from the start

Sergio Cambelo

0 Comments

Leave a ReplyCancel reply

Sergio Cambelo

June 16, 2026

AI

Categories

Archive

You May Also Like

Data Management Becomes Semantic and Cognitive

Not more AI — orchestration. The era of the Agentic Mesh.

The Gap Between Strategy and Execution in the Age of AI

Discover more from Keepler | The AI Enabler Partner