The generative AI boom has made creating impressive prototypes deceptively simple. Yet, a quiet stagnation plagues the corporate world, where countless AI pilots never see the light of day. The leap from a controlled demo to a reliable business asset is a chasm where most initiatives fall, tripped up by the difficult problems of data engineering and governance. This challenge of scaling enterprise AI, a topic we’ve previously examined in ‘Agentic AI Systems: Databricks on the Shift in Enterprise AI’ [1], goes far beyond model selection. To dissect this critical failure point, we turn to Franny Hsiao, EMEA Leader of AI Architects at Salesforce. In this article, she reveals the architectural oversights that doom projects and outlines the practical steps needed to build AI systems that actually survive in the real world.
- The ‘Pristine Island’ Trap: Why Data Infrastructure Is the Scaling Killer
- Building Trust Through Transparency and Perceived Speed
- Beyond the Data Center: The Imperative of On-Device and Edge AI
- Architecting Accountability: Governance with a Human in the Loop
- The Future is Interoperable: Multi-Agent Orchestration and ‘Agent-Ready’ Data
- From Model-Centric Hype to Infrastructure-Centric Reality
The ‘Pristine Island’ Trap: Why Data Infrastructure Is the Scaling Killer
The journey from a promising AI pilot to a scalable enterprise asset is littered with failures. Most of these stem not from the model itself, but from the environment in which it was born. Pilots frequently begin in controlled settings that create a false sense of security, a critical flaw Franny Hsiao identifies as the ‘pristine island’ trap. These projects are developed using small, perfectly clean, and curated datasets, making the AI appear far more capable than it is in a real-world context. This approach builds a successful demo but an unscalable product, as it completely ignores the messy reality of enterprise AI data.
When companies attempt to scale these island-based pilots, the systems break. The model is suddenly exposed to the chaotic waters of siloed, inconsistent, and incomplete data, requiring complex integration, normalization, and transformation just to be usable. The consequences are predictable and severe: data gaps emerge, performance plummets, and inference latency renders the tool unusable. More importantly, this failure erodes the most critical asset for any AI system: the business’s trust in its output.
This isn’t just a technical hurdle; it’s a foundational strategic error. Hsiao is unequivocal on this point, stating that “The single most common architectural oversight that prevents AI pilots from scaling is the failure to architect an AI data platform for enterprise, a production-grade data infrastructure with built-in end to end governance from the start.” [1]. This concept is the key to bridging the gap between pilot and production. A production-grade data infrastructure refers to a robust and reliable data system designed to handle the scale, complexity, and security requirements of real-world enterprise operations. It ensures data is consistently available, accurate, and performant for AI models in live environments.
A cornerstone of this approach is establishing strong data governance from day one, a challenge we delved into in our article ‘Agentic AI Systems: Databricks on the Shift in Enterprise AI’ [2]. While the emphasis on architectural oversights might downplay significant challenges in model robustness and ethical AI, these are often secondary problems. Without a data foundation capable of surviving contact with reality, most projects will never even get the chance to face them.
Building Trust Through Transparency and Perceived Speed
As enterprises deploy large reasoning models, they inevitably collide with a fundamental trade-off: the depth of an AI’s ‘thinking’ versus a user’s patience. Complex, multi-step reasoning requires heavy computation, which introduces delays that can kill enterprise AI adoption before it even begins. When these systems are scaled, performance issues like inference latency become critical. This is the delay between an AI model receiving an input and producing an output or prediction. High inference latency can make AI systems feel slow and unresponsive, rendering them unusable and, more importantly, untrustworthy by eroding user confidence.
To solve this, Salesforce is focusing on what Franny Hsiao calls ‘perceived responsiveness’ through a solution named ‘Agentforce Streaming.’ The strategy is elegantly simple: don’t make the user wait for the entire answer. Instead, the system delivers AI-generated responses progressively, streaming initial findings or acknowledgements while the reasoning engine performs heavy computation in the background. This approach is incredibly effective for reducing the feeling of latency, which is often the biggest barrier to deploying production-grade AI.
Beyond clever engineering, this approach leans heavily on the psychology of user trust. Transparency plays a functional role in managing expectations. As Hsiao notes, by surfacing progress indicators that show the reasoning steps or the tools being used, you do more than just keep users engaged. This visibility, combined with design elements like spinners and progress bars, builds trust by making the system’s process deliberate and understandable rather than a mysterious black box.
However, this focus on perception has its own technical risks. Critics argue that ‘perceived responsiveness’ and transparency mechanisms could simply mask underlying performance issues. Over-reliance on these UX strategies without addressing core latency problems can lead to system instability and a poor user experience under heavy load, ultimately causing more frustration if actual processing times are consistently slow. Successful enterprise AI adoption, therefore, requires a balanced approach: baking in end-to-end observability and guardrails alongside these perceived responsiveness mechanisms to holistically manage both system performance and user trust.
Beyond the Data Center: The Imperative of On-Device and Edge AI
While the narrative around enterprise AI often centers on the immense power of the cloud, a growing focus on cloud AI edge AI solutions shows this model breaks down at the operational edge. For industries with extensive field operations, such as utilities or logistics, reliance on continuous cloud connectivity is simply not a viable option. As Franny Hsiao notes, many enterprise use cases are driven by the absolute need for offline functionality. This edge AI vs cloud debate is pushing the industry beyond the data center and toward on-device and edge AI.
The practical application of this shift is transformative. Consider a field technician in a remote location without a network signal. Using an on-device system, they can photograph a faulty part or an error code. A compact, on-device LLM can then instantly identify the asset and retrieve guided troubleshooting steps from a cached knowledge base stored directly on their handheld device. The entire diagnostic and repair workflow happens immediately, without any dependency on a network connection. Once connectivity is restored, the device seamlessly handles the data synchronization, uploading service logs and field notes to the central system to maintain a single source of truth.
This move to the edge is fueled by clear business imperatives. On-device intelligence and edge AI are critical for enterprise use cases requiring offline functionality, offering tangible benefits like the ultra-low latency needed for real-time decision-making, enhanced privacy by processing sensitive data locally, and significant cost savings from reduced data transmission. However, this decentralized approach is not without its own hurdles. Managing and updating distributed on-device models and their knowledge bases introduces significant operational complexity. Moreover, this architecture presents new security risks, as decentralized AI models and their synchronization processes increase the potential attack surface, making the task of AI enterprise data protection and security more challenging.
Architecting Accountability: Governance with a Human in the Loop
One of the most critical misunderstandings about autonomous agents is treating them as ‘set-and-forget’ tools. According to Franny Hsiao, scaling enterprise AI responsibly means architecting for accountability from day one. This requires a robust governance framework that defines precisely when and how human oversight is required, transforming governance from a reactive measure into a proactive design principle.
The cornerstone of this framework is the ‘human-in-the-loop’ approach. Human in the loop AI systems are models where human oversight and intervention are integrated into an automated system, especially for critical decisions or actions. It ensures accountability, allows for continuous learning, and builds trust in AI systems by requiring human verification at key points. At Salesforce, this is implemented through what Hsiao terms ‘high-stakes gateways.’ These are specific action categories where human approval is mandatory. Examples include any ‘CUD’ actions – Creating, Uploading, or Deleting data – as well as any direct contact with customers. By mandating this verification, the system creates a powerful feedback loop where agents learn directly from human expertise, fostering a system of ‘collaborative intelligence’ rather than unchecked automation.
However, for a human to be effectively ‘in the loop,’ they need clear visibility into the agent’s reasoning. Trusting an agent requires the ability to see its work. To provide this, Salesforce developed a ‘Session Tracing Data Model (STDM),’ an observability tool that captures ‘turn-by-turn logs’ of every agent interaction. This data provides granular, step-by-step visibility into the user’s question, the planner’s steps, tool calls, inputs and outputs, and any errors encountered. This rich dataset fuels critical functions like ‘Agent Analytics’ to measure adoption, ‘Agent Optimisation’ to refine performance, and ‘Health Monitoring’ for tracking uptime and latency.
Of course, this approach is not without its challenges. Mandating human intervention for all high-stakes gateways could introduce bottlenecks, potentially negating the very efficiency gains that automation promises and increasing AI implementation costs. Furthermore, there is a significant human resource risk to consider. If operators perceive these verification steps as an unnecessary burden or a sign of mistrust in their expertise, it can lead to resistance and significant adoption challenges. Ultimately, successful governance lies in striking a delicate balance: leveraging human wisdom to guide autonomous power without stifling its potential.
The Future is Interoperable: Multi-Agent Orchestration and ‘Agent-Ready’ Data
As businesses increasingly deploy autonomous agents from various vendors, the next frontier of enterprise AI is not about individual agent capability, but collective intelligence. The primary challenge becomes enabling these disparate systems to collaborate effectively, a complex task known as multi-agent orchestration. This refers to the coordination and management of multiple independent AI agents that need to work together to achieve a larger goal. It involves defining how these agents communicate, share information, and collaborate effectively to prevent conflicts and ensure unified action. Without a common multi agent orchestration framework, enterprises risk creating a collection of siloed, inefficient tools and facing significant integration risk.
Franny Hsiao argues that solving this requires a two-layered approach rooted in open standards, which she deems ‘non-negotiable’ to prevent vendor lock-in and foster innovation. The first layer addresses the mechanics of communication. For this, Salesforce is adopting open-source standards like MCP (Model Context Protocol) and A2A (Agent to Agent Protocol), and co-founded OSI (Open Semantic Interchange) [2]. The second layer tackles the more profound challenge of meaning. While protocols can connect agents, a unified semantic protocol like OSI ensures they truly understand each other’s intent.
However, even perfect agent-to-agent communication is insufficient if the underlying data is inaccessible or fragmented. This pivots the conversation to the ultimate hurdle: creating agent-ready data. Hsiao predicts the future of AI scaling hinges not on building bigger models, but on transforming legacy systems into the searchable, context-aware data architectures that this concept implies. It is this fundamental shift in data infrastructure that will finally unlock the hyper-personalized and transformative user experiences that enterprise AI has long promised.
From Model-Centric Hype to Infrastructure-Centric Reality
As Franny Hsiao aptly concludes, the defining challenge ahead isn’t about the race for bigger, newer models; it’s about building the orchestration and data infrastructure that allows production-grade agentic systems to thrive. This pivot from a model-centric to an infrastructure-centric mindset encapsulates the path to scaling enterprise AI successfully. The journey requires a solid foundation built upon several critical pillars: a production-grade data architecture, transparent systems that earn user trust, the practical utility of edge AI, accountability through human-in-the-loop governance, and a commitment to an open, interoperable ecosystem.
The future of enterprise AI will likely diverge based on how seriously organizations address these architectural imperatives. A positive scenario sees enterprises implementing robust data governance and open standards, leading to scalable AI deployments that drive significant business value. In a neutral outcome, adoption progresses steadily, but challenges in data integration and multi-vendor interoperability limit the full potential of agentic systems. The negative path is one where architectural oversights and data fragmentation cause widespread pilot failures, escalating costs, and a deep distrust in AI’s ability to deliver real-world impact. Ultimately, turning AI’s promise into a transformative business asset depends entirely on this foundational work.
Frequently asked questions
What is the primary reason enterprise AI pilots fail to scale?
The single most common architectural oversight preventing AI pilots from scaling is the failure to architect a production-grade AI data platform for enterprise, with built-in end-to-end governance from the start. Pilots often begin with small, perfectly clean datasets, creating a ‘pristine island’ trap that ignores the messy reality of enterprise data. When scaled, these systems break due to siloed, inconsistent, and incomplete data, leading to performance issues and eroding trust.
How does Salesforce address high inference latency in enterprise AI systems?
Salesforce addresses high inference latency through ‘perceived responsiveness’ using a solution called ‘Agentforce Streaming.’ This strategy delivers AI-generated responses progressively, streaming initial findings while heavy computation occurs in the background. This approach reduces the feeling of latency and builds user trust by making the system’s process deliberate and understandable with progress indicators.
Why is on-device and edge AI becoming crucial for enterprise use cases?
On-device and edge AI are crucial for enterprise use cases requiring offline functionality, especially in industries with extensive field operations like utilities or logistics. This approach offers ultra-low latency for real-time decision-making, enhanced privacy by processing sensitive data locally, and significant cost savings from reduced data transmission. It allows critical workflows to happen immediately without continuous cloud connectivity.
What is the ‘human-in-the-loop’ approach in AI governance?
The ‘human-in-the-loop’ approach integrates human oversight and intervention into automated AI systems, particularly for critical decisions or actions. At Salesforce, this is implemented through ‘high-stakes gateways’ where human approval is mandatory for actions like Creating, Uploading, Deleting data, or direct customer contact. This ensures accountability, allows for continuous learning, and builds trust by requiring human verification at key points.
How does multi-agent orchestration help solve enterprise AI challenges?
Multi-agent orchestration solves enterprise AI challenges by coordinating and managing multiple independent AI agents to work together towards a larger goal. It defines how agents communicate, share information, and collaborate effectively, preventing conflicts and ensuring unified action. This approach, rooted in open standards like MCP and A2A, is essential for enabling disparate systems from various vendors to achieve collective intelligence and avoid vendor lock-in.
