A new paper from researchers including Lin, Pao, Hoilam, Zhan, Shaoxiong, Zheng, and Hai-Tao, published on arXiv, explores whether decades of computer architecture wisdom can guide the design of next-generation model-native systems. As large language models transition from model technology to system technology, the authors draw a detailed analogy: treating the LLM as a CPU, the KV cache as processor cache, the context window as main memory, and the agent framework as an operating system. According to the paper, engineering challenges such as cache reuse, context capacity, agent scheduling, and permission control mirror classical computer systems problems.
The paper proposes the Intelligent Computing Architecture (ICA), a unified framework consisting of six functional layers with interface contracts and design axioms. This architecture resolves a central tension: whether the LLM resembles a CPU or an OS. The solution is a dual-plane architecture comprising a probabilistic execution plane (what can be computed) and a deterministic control plane (what should be computed). Every layer passes through as a graded crossover between these planes.
To provide practical design guidance, the authors introduce three Amdahl-style design heuristics:
- Semantic Locality: Groups data with similar semantic meaning to improve cache reuse and reduce latency.
- Context Budget: Allocates limited context window capacity among competing agents or tasks.
- Agent Speedup: Measures the performance gain from parallelizing agent execution.
The paper illustrates these heuristics with parameter ranges from published data but notes that predictive validation remains the principal open task. The authors also articulate analogy boundaries and differences between silicon and model-era architectures, proposing a research roadmap for the field.
As a conceptual and survey contribution, the paper does not present new experimental results. It synthesizes literature across LLM as OS, memory management, agent frameworks, tool protocols, multi-agent coordination, cognitive architectures, and safety governance, finding that each addresses a different layer without a unifying model until now. For CTOs and technology leaders exploring future system architectures, the ICA framework offers a structured way to think about scaling AI systems by borrowing proven design principles from computer architecture.