Build Agents Where the Data Is for Trusted, Scalable AI

TL;DR As AI agents scale across the enterprise, organisations are encountering new challenges: growing agent sprawl with inconsistent outputs, declining trust in AI decisions, and overlapping capabilities. The root cause is not model performance but failing to bring AI to where trusted, governed data resides. This article explores why AI success now depends on rethinking data management.

In a recent financial sector client engagement, we built AI agents within Databricks to analyse customer 360 data containing behavioural patterns related to loyalty and churn. Instead of relying on fragmented integrations or external data pipelines, the agents accessed data where it already resided. Every agent recommendation was grounded in accurate, up-to-date information.

As a result, the agents were able to:

Mitigate risks by detecting anomalies in customer data in near real time
Increase profitability by identifying revenue leakage patterns
Accelerate decision-making by surfacing actionable insights directly to operations teams through natural language interfaces.

This allowed for clear next best actions such as proactive customer outreach, targeted retention offers, and personalised marketing interventions to reduce churn and increase lifetime value.

More importantly, it increased trust in AI outputs, enabling business teams to act with more specific intent on insights with confidence.

Challenges with Current Enterprise AI Approaches

The core challenge is no longer how to build AI agents, but how to integrate them into enterprise systems with governance that makes them trusted. Many organisations are deploying agents across fragmented architectures where weak integration and inconsistent governance leads to unreliable outputs and limited visibility. However, trusted and scalable AI emerges when integration and governance are treated as first-class citizens - coupling agents and governed data environments as one.

Poorly Designed Data Integrations

Across enterprises, there is a rapid expansion of AI capabilities, including customer-facing chat interfaces, internal copilots embedded within workflows, API orchestrations, and even real-time voice agents operating through telephony infrastructure.

However, at times, this growth is occurring without a holistic evolution in the underlying data architectures. Many AI solutions integrate with existing enterprise systems without a clear understanding of the underlying data, data models, and semantic relationships. They sometimes even operate through pure prompt engineering approaches, with partial context and inconsistent data integration.

These designed AI data integrations appear highly effective in demos, but frequently degrade in production. You may get AI outputs that are incomplete, have hallucinations, or become misaligned with actual business conditions.

Misplaced Improvement Efforts

At times, the architecture of these solutions may place more focus on foundational model selection and context window size as primary levers for improving AI performance. However, larger context windows do not address underlying data quality issues. An agent operating on poor-quality data remains unreliable, regardless of context window sizes, as smaller but well-curated datasets often produce more accurate outcomes.

Lack of Clarity on Data Trust and Lineage

The fundamental questions that determine AI reliability are often not being addressed:

Where is the data coming from?
Can it be trusted?
Is it governed and current?
Does it reflect the operational reality of the business?

Without clear answers to these questions, AI systems cannot be relied upon for decision-making at scale.

Data Gravity Is Becoming Impossible to Ignore

Data has gravity, meaning that as data volumes grow within a platform, it naturally attracts applications, services, and workflows that operate closer to it. Over time, this creates a center of mass where moving data becomes harder, slower, and more expensive than moving the compute to the data. Thus, more codified processes and services begin to orbit around it. For AI systems, the distance between an agent and the data it relies on directly impacts its accuracy, latency, and ultimately trust.

Consider the rise of voice agents in contact centres. Data has to flow between the user and the agent in real time, convert from audio to text and back, and still allow the AI to reason - all within a fraction of a second. If the agent sits far from the voice infrastructure or the customer data, latency increases, and context degrades. The agent starts making lagging decisions without immediate access to relevant information. It becomes less reliable regardless of the model or AI infrastructure in use.

Regulatory and data sovereignty constraints further emphasise the importance of data gravity. Sovereign data laws limit the movement of sensitive data across borders. Regulatory frameworks, such as by APRA, impose strict requirements that AI systems must be explainable, auditable, and aligned with governance policies.

Pulling data into external AI systems introduces risk, complexity, and often non-compliance. The implication is clear: If data cannot move, AI must come to the data.

From AI Around the Platform to AI Within It

Platforms like Databricks are emerging as the central point of data gravity. In an AI-first world, Databricks is not just a unifying data store. It presents an environment where AI engineering, data engineering, analytics, governance, and identity can come together.

Instead of building agents around the edges of the platform, leading organisations are beginning to deploy both AI-driven and general-purpose applications serverlessly on the platform where their data already resides.

This fundamentally changes how AI applications are built and operated. Instead of provisioning separate infrastructure, managing multiple APIs across heterogeneous platforms, or orchestrating complex deployment pipelines, teams can develop and deploy applications natively alongside their data, models, and governance controls. Everything runs within the same trusted boundary.

The benefits are immediate.

AI Outputs Become More Accurate

Instead of relying on fragmented integrations, AI agents can draw directly from curated, governed datasets managed through Unity Catalog. This ensures consistent access to high-quality data, reducing the risk of hallucinations.

Built-In Governance, Lineage, and Auditability

Security and governance no longer need to be reimplemented for every AI application. They are inherited from Unity Catalog, ensuring that every application respects the same access controls, lineage, and audit requirements as the underlying data. Access controls, lineage and auditability already exist. The agent simply operates within those pre-existing security roles and boundaries.

Simplified Access Management and Control

Identity becomes unified and vertically integrated across layers. With a single sign-on (SSO) experience, users can interact with data and AI through a consistent interface. Context is preserved across interactions, and every AI agent can operate under a Service Principal identity when accessing sensitive information and data. Even row-level security can be applied, so the AI only sees what it is allowed to access.

Cost-Effective Operations and Performance

Operational complexity is also lowered. There is one platform to manage, one place to observe behaviour and one environment to scale.

Data movement is minimised by design. Instead of copying data into multiple systems, the agent accesses it where it lives. With Databricks Lakebase, this reduces latency and removes entire classes of integration challenges while automatically enforcing permissions.

Latency is reduced because applications interact with data locally rather than over distributed systems.

Accelerated Innovation

Serverless application hosting also enables rapid experimentation and iteration. Teams can move from prototype to production without re-architecting their solutions, allowing AI and traditional applications to evolve together. This is particularly powerful for AI use cases, where applications, agents, and data pipelines are tightly coupled and must scale dynamically with demand.

Transforming How the Business Engages with Data

A unified interface brings together data, AI, dashboards and applications into a consistent experience. Identity, access, and context are preserved across workflows. This transforms decision-making within the organisation.

Once data is consolidated within Databricks, capabilities such as Databricks Genie and Databricks One fundamentally change how it is accessed and used across the organisation. Instead of relying on analysts or fragmented tools, business users can easily interact with governed enterprise data through natural language.

This removes friction between teams and reduces dependency on intermediaries. Data moves from a passive asset to an active, decision-making layer embedded directly into the business operations.

Final Words

AI should become a native extension of the enterprise, and not just the data platform. In some cases, this means bringing AI to centralised data platforms, while in others, it means embedding AI into new business processes that orchestrate data across multiple systems. What really matters is how well your integrations, governance, and data lineage are maintained across the flow of information.

Organisations that get this balance right can move towards a coordinated intelligence layer embedded within their business. Which means that your agents can operate with clear context and consistent access to trusted data, all within well-defined governance boundaries. This leads to greater AI trust and adoption, allowing organisations to accurately measure AI success and ROI.