Model-Native Computing Architecture: Can Decades of Computer Architecture Wisdom Guide Next-Gen AI Systems?

A visionary survey from arXiv proposes an analogy between large language model components and classical computer architecture, treating the LLM as a CPU and context window as main memory. The authors introduce the Intelligent Computing Architecture (ICA) with six functional layers and a dual-plane architecture, along with three Amdahl-style design heuristics: Semantic Locality, Context Budget, and Agent Speedup.

iGEN Editorial

June 17, 2026

Model-Native Computing Architecture: Can Decades of Computer Architecture Wisdom Guide Next-Gen AI Systems?

A new paper from researchers including Lin, Pao, Hoilam, Zhan, Shaoxiong, Zheng, and Hai-Tao, published on arXiv, explores whether decades of computer architecture wisdom can guide the design of next-generation model-native systems. As large language models transition from model technology to system technology, the authors draw a detailed analogy: treating the LLM as a CPU, the KV cache as processor cache, the context window as main memory, and the agent framework as an operating system. According to the paper, engineering challenges such as cache reuse, context capacity, agent scheduling, and permission control mirror classical computer systems problems.

The paper proposes the Intelligent Computing Architecture (ICA), a unified framework consisting of six functional layers with interface contracts and design axioms. This architecture resolves a central tension: whether the LLM resembles a CPU or an OS. The solution is a dual-plane architecture comprising a probabilistic execution plane (what can be computed) and a deterministic control plane (what should be computed). Every layer passes through as a graded crossover between these planes.

To provide practical design guidance, the authors introduce three Amdahl-style design heuristics:

Semantic Locality: Groups data with similar semantic meaning to improve cache reuse and reduce latency.
Context Budget: Allocates limited context window capacity among competing agents or tasks.
Agent Speedup: Measures the performance gain from parallelizing agent execution.

The paper illustrates these heuristics with parameter ranges from published data but notes that predictive validation remains the principal open task. The authors also articulate analogy boundaries and differences between silicon and model-era architectures, proposing a research roadmap for the field.

As a conceptual and survey contribution, the paper does not present new experimental results. It synthesizes literature across LLM as OS, memory management, agent frameworks, tool protocols, multi-agent coordination, cognitive architectures, and safety governance, finding that each addresses a different layer without a unifying model until now. For CTOs and technology leaders exploring future system architectures, the ICA framework offers a structured way to think about scaling AI systems by borrowing proven design principles from computer architecture.

Sources:

Model-Native Computing Architecture: Can Decades of Computer Architecture Wisdom Guide Next-Gen AI Systems?

Recommended Stories

MADAR Processor Abolishes Addressing to Cut Energy and Accelerate AI Workloads

Teradar pushes Summit sensor closer to serialization with new OEM deal from German automaker

Teenage Engineering APC-2 record cutter weighs 140g — or twice a human, claims TechRadar

Humanoid Robot Training Via Teleoperation Emerges as New Blue-Collar Job in Shenzhen