By the end of 2025, roughly one in six people worldwide were using AI tools, according to Sudhir Hasbe, President and CPO at Neo4j. Yet, as budgets tighten, many CEOs still struggle to point to clear revenue gains or cost reductions from AI investments, despite huge spending on the technology. The problem, Hasbe writes, is not that AI has stalled — models are stronger, cheaper, and easier to deploy than ever — but it sits in the messy space between experimentation and production, a phase he calls 'AI pilot purgatory'.
The Rise of 'Pilot Purgatory'
Many organizations are now trapped in a never-ending loop of pilots and proof-of-concepts. Small teams can easily spin up agents that work in a sandbox, but when asked to scale across departments, integrate with live systems, or stand up to audit and risk scrutiny, these projects often fall apart. Typically, the project simply slows down, then stalls, and slowly comes to a halt. Often, ownership becomes unclear, confidence drains away, and nobody wants to sponsor the move to production.
Reframing Around Context, Not Models
These projects fail, Hasbe argues, because too much attention is paid to model choice and prompt design, and not enough to 'context' — the connective tissue of organizational knowledge. Most AI systems are deployed without access to that context. As agents take on more autonomy, users expect them to behave like junior colleagues: justifying decisions, citing policy, and adapting as rules change. If they cannot, they quickly become a liability. Logs and dashboards record what happened but strip it of meaning; a timestamped action tells little about intent.
Why Traceability Beats Explainability
Instead of treating context as a pile of documents to retrieve information from, some organizations are beginning to model it explicitly with context graphs — connected maps of decision history that make organizational judgments searchable, traceable, and reusable. The idea is to link people, policies, systems, decisions, and outcomes into a connected structure that evolves over time. Crucially, it captures decision traces: what happened, which policies were applied, what exceptions were made, and the reasoning behind the outcome, far beyond traditional tools.
For example, a policy might state that discounts above a certain threshold require senior sign-off. Yet the actual conversation that led to an exception — emails explaining strategic rationale, team messaging threads where colleagues validated the reasoning, and informal approvals — may not be logged in any specific application. Decision traces fill that gap, surfacing not just what happened, but why. This approach addresses organizational amnesia: when teams change, policies shift, and systems are replaced, a connected context layer allows learning to accumulate. Agents can inherit institutional knowledge rather than improvising each time, making it far easier to spot patterns, exceptions, or outlier decisions that trace back to the same broken step.
What Scaling AI Actually Demands
Scaling AI, Hasbe writes, is less a technical upgrade than an organizational one. It forces uncomfortable questions about data quality, ownership, permissions, and accountability — clarity on who sets policy, who can override it, and how exceptions are handled. These are not problems a better model can solve. To move past pilot purgatory, organizations should start small, within specific decision domains, rather than in grand AI programs. They should map the rules, actors, and outcomes involved, then let agents operate within that bounded context. As trust grows, the scope expands. Over time, what emerges is not just an AI system but a living map of how the organization works.
The next phase of enterprise AI will not be won by those chasing gains in model performance. It will be shaped by those who invest in building and maintaining context graphs that preserve institutional memory.