User as Code: Executable Memory Paradigm Boosts AI Agent Personalization with Deterministic Execution

A new arXiv preprint introduces User as Code (UaC), a paradigm where AI agent user memory is stored as executable Python code rather than unstructured text. On benchmarks, UaC achieves 78.8% recall on LOCOMO and 99% accuracy on aggregate questions, while enabling unsolicited safety alerts that retrieval-based systems cannot provide.

iGEN Editorial

June 16, 2026

User as Code: Executable Memory Paradigm Boosts AI Agent Personalization with Deterministic Execution

Enterprise AI agents struggle to maintain consistent, actionable user memory across conversations. A new research paper from arXiv, authored by Li and Bojie, introduces a paradigm called User as Code (UaC) that stores user profiles as executable Python programs, enabling deterministic reasoning and proactive safety-critical alerts.

The Problem with Retrieval-Based Memory

According to the paper, current personalized AI agents almost always store user memory as unstructured text, a knowledge graph, or a flat store of facts. They consult this memory by retrieval — fetching entries most similar to the current request. The authors label this approach "bag-of-facts" memory, which recalls individual facts well but struggles to resolve contradictions, aggregate over many records, or enforce rules. Because storing a fact and acting on it are separate steps, the memory cannot compute derived answers or trigger alerts without explicit query.

How User as Code Works

UaC reimagines user memory as a living software project. In this paradigm, typed Python objects hold the user's state, and ordinary Python functions encode the rules that govern that state. Representation and reasoning happen in one medium that an interpreter can run. The enabling mechanism is a two-phase pipeline: an append-only log that never discards a fact, periodically checkpointed into typed code. This means the memory is executable — the AI can run the code to compute answers or enforce rules automatically.

Benchmark Performance

On standard long-term conversation benchmarks, UaC matches both a full-context upper bound and the strongest prior memory systems on recall. Specifically, it achieves 78.8% recall on the LOCOMO benchmark. However, its advantage emerges where representation matters most. On aggregate questions over a user's history — for example, "how many international trips did I take last year?" — retrieval-based memory collapses, scoring between 6% and 43%, while UaC stays near-perfect at 99%. The reason, the paper explains, is that the answer is a one-line computation over typed state rather than a search over text.

Metric	Retrieval-Based Memory	User as Code (UaC)
LOCOMO recall	Comparable to UaC (78.8%)	78.8%
Aggregate query accuracy	6% – 43%	99%

Moreover, because UaC's rules execute deterministically whenever the state changes, it can surface unsolicited, safety-critical alerts — such as a newly prescribed drug that conflicts with an allergy recorded months earlier. The paper notes this is a capability that query-driven memory cannot provide.

Implications for Enterprise AI

For enterprise technology leaders, UaC offers a framework for building AI agents that maintain reliable, computable user profiles. Instead of sifting through conversational logs, agents can execute code to enforce compliance rules, aggregate usage patterns, or detect conflicts in real time. The use of typed Python objects and deterministic functions means that memory is auditable and testable — critical for regulated industries. While the research is still at the preprint stage, the architecture points toward more robust personalization in customer service bots, internal knowledge assistants, and any system requiring persistent, actionable user context.

Sources:

User as Code: Executable Memory Paradigm Boosts AI Agent Personalization with Deterministic Execution

The Problem with Retrieval-Based Memory

How User as Code Works

Benchmark Performance

Implications for Enterprise AI

Recommended Stories

New Training-Free Method Enables Robots to Follow Personalized Commands Like 'Bring My Cup'

Learning What to Remember: Observability-Safe Memory Retention via Constrained Optimization for Long-Horizon Language Agents

LedgerAgent: A New Method for Policy-Adherent Tool-Calling AI Agents in Customer Service

G2Rec Framework Structures and Tokenizes User Interests for Generative Recommendation