There is a pattern that keeps repeating itself across many organisations right now. An operations team identifies a specific problem that seems perfect for an AI agent: incident triage, automatic escalation, resolution of recurring alerts. The team has genuine domain expertise, motivation, and access to modern tools. They start building. And within a few weeks, the same obstacle appears. Always the same one: the data isn’t where it should be, it doesn’t mean what it appears to mean, or it can’t be trusted enough to allow an AI to make decisions based on it.
It’s not a technology problem. It’s a data foundations problem.
This article isn’t about how to build AI agents. It’s about what needs to be in place before you start, so that when the agent reaches production, it doesn’t become a source of incorrect decisions that are difficult to trace.
The profile adopting agents the fastest right now
Operations teams are probably the group that can extract the most value from AI agents in the short term. They understand their processes in depth, they have an endless list of repetitive tasks that consume time, and they are able to define precisely what a good decision looks like versus a bad one.
But precisely because of that, they are also the most exposed to risk: they build on top of operational data that was never designed to be consumed by automated systems. Logs from legacy systems, records stored in inconsistent formats, statuses that only make sense if you understand the historical context of the process. Data that an experienced operator interprets correctly because they have spent years working with it. Data that an agent will interpret literally, without that context.
The most common outcome isn’t a visible catastrophic failure. It’s something more subtle and more dangerous: an agent that works well 80% of the time and fails silently during the remaining 20%. And without the right mechanisms in place, nobody knows until the issue has already escalated.
What Does It Mean to Have AI-Ready Data?
When we talk about preparing data for an AI agent, we’re not talking about having a perfectly organised data lake or having completed a three-year data strategy. We’re talking about answering four specific questions before writing the first line of code for the agent:
Where does the data the agent will use come from?
It sounds obvious, but in practice many teams start building on data sources they don’t fully control: third-party system APIs, manual exports, operational databases with no availability SLAs. If the agent depends on a source that may become unavailable or change format without warning, the agent will inherit that fragility.
What does each field mean, and who decides that?
Operational data is full of implicit conventions. A field called “status” may have twelve possible values, three of which are technically equivalent but are used in different systems for historical reasons. Without at least a basic data dictionary, the agent will learn those inconsistencies and reproduce them in its decisions.
Who can access what, and under what conditions?
AI agents are not human users, but they access data with exactly the same privacy and compliance implications. In regulated sectors such as airlines, banking, insurance or healthcare, this isn’t optional. An agent accessing passenger information to resolve operational incidents requires exactly the same level of access control as any production system. If this isn’t defined from the outset, it becomes technical and regulatory debt that slows deployment.
How will we know if the agent is making decisions based on incorrect data?
This is the question the fewest teams ask before they start, and the most important one. Observability for an agent isn’t just about monitoring whether it responds or not. It’s about being able to trace, for any decision it makes, what data it consulted, when it consulted it, and what values it used. Without that, debugging incorrect behaviour becomes almost impossible.
Data Governance Isn’t Bureaucracy. It’s What Makes Scale Possible.
There is a widespread perception among business teams that data governance is an IT initiative that slows things down. In the context of AI agents, that perception is particularly costly.
An agent in production makes decisions continuously and autonomously. Every incorrect decision carries a cost: operational, financial, or in terms of user trust. Unlike a manual process, where an operator can correct issues as they arise, an agent scales errors just as quickly as it scales success.
This doesn’t mean organisations need to complete a two-year data governance programme before launching their first agent. It means identifying, for the specific use case, which data is critical, who owns it, and what level of quality is acceptable. It’s an exercise that can be completed in days if the scope is well defined.
The difference between AI initiatives that reach production and those that remain trapped in perpetual pilot mode rarely lies in the model or the platform. It lies in whether the team did this preparatory work on the data.
What should a CIO or CDO do before authorising development?
If you’re in a decision-making position regarding whether to launch an AI agent initiative within your organisation, there are three concrete actions that make the difference between a pilot that scales and one that doesn’t:
- Identify the owner of every data source the agent will consume. Not the technical owner of the system, but the business stakeholder who can answer whether the data is accurate and complete. If that role doesn’t exist, create it before you begin.
- Define the minimum data contract. For the specific use case, determine which fields are essential, which values are valid, and what level of latency is acceptable. You don’t need a complete enterprise data catalogue; you need to know exactly what the agent requires to operate correctly.
- Establish a human review mechanism for high-impact decisions. Agents don’t have to be fully autonomous from day one. A design that keeps the operator in the loop for exceptions is more robust, easier to audit, and generates greater organisational trust. Autonomy should expand as the system demonstrates reliability, not the other way around.
Building it right from the start
The adoption of AI agents in operations is a genuine opportunity for organisations with teams that possess deep domain expertise and clearly defined problems. But that opportunity only materialises if the data foundations are in place.
This isn’t about perfection. It’s about not building on sand. An agent operating on well-governed data, with traceability and access controls clearly defined, doesn’t just perform better: it’s an agent the organisation can trust. And in critical operations, trust in the system is just as important as its accuracy.
The teams that begin this journey with their data in order, ownership clearly assigned, and observability designed from the outset are the ones that, six months later, have agents in production delivering real work. The ones that don’t still have pilots that remain just that: pilots.
Sergio Cambelo
Cloud Architect at Keepler. "Knowledge is the foundation of what we do everyday to help our customers to achieve their goals. I like to design complex cloud architectures but what I enjoy the most is learning and sharing the knowledge with others to bring value to Keepler and their customers."





0 Comments