iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
P3B3 Benchmark Reveals Strong Brazilian Portuguese Bias in Large Language Models Controlled Dynamics Attractor Transformer: New Model Targets Graph Anomaly Detection with Biologically Plausible Attention Tamil Nadu OE Spinning Mills Threaten 50% Production Cut Over High Cotton Waste Prices BridgePolicy: New Diffusion Bridge Method Improves Visuomotor Policy Learning in Robotics New Theory Explains How Deep Transformers Achieve Adaptive Inference Using Function Vectors PVminerLLM2 Uses Preference Optimization to Improve Structured Patient Voice Extraction Beyond Models: Reflections on Engineering AI-enabled Systems in a Project-Based Course AutoDojo: Adaptive Attacks Expose Superficial Defenses and Structural Limits in LLM Agents Calibrated Variance Propagation Cuts Uncertainty Estimation Cost for Deep Learning Models Patel Engineering Joint Venture Secures ₹126 Crore Tasgaon Lift Irrigation Project in Maharashtra P3B3 Benchmark Reveals Strong Brazilian Portuguese Bias in Large Language Models Controlled Dynamics Attractor Transformer: New Model Targets Graph Anomaly Detection with Biologically Plausible Attention Tamil Nadu OE Spinning Mills Threaten 50% Production Cut Over High Cotton Waste Prices BridgePolicy: New Diffusion Bridge Method Improves Visuomotor Policy Learning in Robotics New Theory Explains How Deep Transformers Achieve Adaptive Inference Using Function Vectors PVminerLLM2 Uses Preference Optimization to Improve Structured Patient Voice Extraction Beyond Models: Reflections on Engineering AI-enabled Systems in a Project-Based Course AutoDojo: Adaptive Attacks Expose Superficial Defenses and Structural Limits in LLM Agents Calibrated Variance Propagation Cuts Uncertainty Estimation Cost for Deep Learning Models Patel Engineering Joint Venture Secures ₹126 Crore Tasgaon Lift Irrigation Project in Maharashtra
Home ›› Technology ›› Ai ›› Llms ›› New Agentic Programming Framework Shifts Control from LLMs to Deterministic Code for Greater Reliability

New Agentic Programming Framework Shifts Control from LLMs to Deterministic Code for Greater Reliability

A new paper argues that current LLM agent frameworks have architectural flaws leading to token explosion, control-flow hallucination, and unreliable completion. The authors propose Agentic Programming, where the program governs all control flow and the LLM is an adaptive component called LLM-as-Code, invoked only for reasoning or generation. A case study on computer-use agents shows improved stability in long visual operation sequences.

iG
iGEN Editorial
June 16, 2026
New Agentic Programming Framework Shifts Control from LLMs to Deterministic Code for Greater Reliability

Enterprises deploying AI agents for automation tasks often encounter reliability issues: agents go off-track, consume excessive tokens, or fail to complete sequences. A new paper on arXiv argues these problems are not merely implementation bugs but architectural consequences of the dominant design pattern that gives the LLM the role of orchestrator.

According to the paper titled "LLM-as-Code Agentic Programming for Agent Harness," every major LLM agent framework allows the model to decide what to do next, when to call tools, and when to stop. The researchers identify three persistent issues: token explosion, control-flow hallucination, and unreliable completion. They write: "A better prompt or a stronger model cannot guarantee the reliability of the LLM agent."

"A better prompt or a stronger model cannot guarantee the reliability of the LLM agent."

The fundamental problem, the authors argue, is assigning the deterministic work of looping, branching, and sequencing to a probabilistic system. To solve this, they propose Agentic Programming, a paradigm in which the program governs all control flow, and the LLM is itself part of it—an adaptive component they call LLM-as-Code. The LLM is invoked only where a task calls for reasoning or generation. Within each call the model keeps full flexibility, but it cannot alter the program's execution path.

With control in the program, the LLM's context is built from the execution history's call tree and forms a directed acyclic graph (DAG). Each call's context length is then determined by its call depth rather than by accumulation over steps. This design prevents context length from growing unboundedly, reducing token consumption and improving determinism.

Characteristic Traditional LLM Agent Frameworks Agentic Programming (LLM-as-Code)
Control Flow LLM decides next action, tool calls, and stop Program governs all control flow via deterministic code
LLM Role Orchestrator with full autonomy Adaptive component invoked only for reasoning/generation
Context Construction Accumulates over steps, unbounded growth Built from call tree (DAG), bounded by call depth
Reliability Prone to token explosion, hallucination, incomplete tasks Improved stability in long sequences

The paper presents a case study of computer-use agents—such as those that automate GUI interactions. The authors found that the Agentic Programming design is "practical, not just a theoretical stance," and that it "substantially improve[s] the stability of long visual operation sequences."

For enterprise technology leaders evaluating AI agents for supply chain automation or logistics workflows, the findings suggest that architectural choices matter as much as model selection. By separating control flow from probabilistic reasoning, organizations can build agents that complete multi-step tasks with greater predictability. The LLM-as-Code approach keeps the flexibility of large language models where needed while ensuring that the overall process remains under deterministic governance.

The research was conducted by a team including Qi, Junjia, Fu, Zichuan, Gao, Jingtong, Zhang, Wenlin, Yan, Hanyu, Wu, Zhao, and Xiangyu. The full paper is available on arXiv.

As enterprises seek to deploy AI agents in production environments—from customs documentation to warehouse robotics—the reliability guarantees offered by Agentic Programming could reduce operational risks. The paper provides a concrete architectural pattern that addresses the root causes of agent instability, offering a pathway to more trustworthy autonomous systems.


Sources:

Keep Reading

Recommended Stories

P3B3 Benchmark Reveals Strong Brazilian Portuguese Bias in Large Language Models Technology

P3B3 Benchmark Reveals Strong Brazilian Portuguese Bias in Large Language Models

According to a new research paper, a team introduced P3B3, an expert-curated benchmark for measuring bias between European and Brazilian Portuguese in large language models. Experiments show most LLMs strongly prefer Brazilian Portuguese, underscoring the need for more balanced variety representation in conversational AI.

June 16, 2026
PVminerLLM2 Uses Preference Optimization to Improve Structured Patient Voice Extraction Technology

PVminerLLM2 Uses Preference Optimization to Improve Structured Patient Voice Extraction

Researchers introduce PVminerLLM2, an improved set of LLMs for structured extraction of patient voice from unstructured text. The model uses preference optimization with token-level gated stabilization and confusion-aware pair construction to outperform supervised fine-tuning baselines. The code and trained models are publicly available.

June 16, 2026
AutoDojo: Adaptive Attacks Expose Superficial Defenses and Structural Limits in LLM Agents Technology

AutoDojo: Adaptive Attacks Expose Superficial Defenses and Structural Limits in LLM Agents

The AutoDojo framework adaptively optimizes indirect prompt injections against LLM agent defenses, revealing that many current defenses are superficial. Against a filter that reduces static attack success rate to 0%, AutoDojo recovers 28% overall and 64% on action-open tasks due to a structural limitation where injections can pose as ordinary data.

June 16, 2026
New Research Reveals Truthfulness Preserved Across LLM Lineages, Enabling Better Hallucination Control Technology

New Research Reveals Truthfulness Preserved Across LLM Lineages, Enabling Better Hallucination Control

A new paper from researchers shows that truthfulness-related attention heads are preserved across generations of large language models, even after instruction tuning or multimodal adaptation. The authors propose TruthProbe, a soft-gating strategy that amplifies these heads to reduce hallucinations, with improvements on HaluEval, POPE, and CHAIR benchmarks.

June 16, 2026