Enterprises deploying large language models (LLMs) face a rapidly growing complexity in inference serving. Production systems now combine disaggregated execution, complex parallelism, runtime optimizations, and stateful workloads such as reasoning, agents, and reinforcement learning rollouts. According to a preprint from arXiv, existing simulators lack the architectural completeness and decision-grade fidelity needed to explore this design space. Their monolithic-replica abstractions are ill-suited to disaggregated serving, while average-case analytical proxies can distort SLA predictions and even reverse optimization conclusions.
Now, a team of researchers has introduced Frontier, a discrete-event simulator purpose-built for modern LLM inference serving. The work is authored by Feng, Yicheng; Tan, Xin; Deng, Yangtao; Jiang, Yimin; Zhu, Yibo; and Xu, Hong, and was published on arXiv under the title "Frontier: Towards Comprehensive and Accurate LLM Inference Simulation".
Disaggregated Abstraction and Key Optimizations
Frontier features a disaggregated abstraction that captures the structure and dynamics of modern serving systems. It models co-location, Prefill-Decode Disaggregation (PDD), and Attention-FFN Disaggregation (AFD) with role-specific cluster workers. The simulator incorporates key runtime optimizations within a scheduler-batch-engine loop, including CUDA Graphs and speculative decoding. It also supports stateful requests for emerging workloads like agents and RL rollouts.
Accuracy Benchmarks
The researchers tested Frontier on a 16-H800 GPU testbed. The simulator achieved an average throughput error below 4%. Compared with state-of-the-art simulators, Frontier reduced end-to-end latency error:
| Scenario | Error with State-of-the-Art Simulators | Error with Frontier |
|---|---|---|
| Co-location | 44.9% | 6.4% |
| Disaggregation | 51.7% | 2.6% |
Frontier scales to over 1,000 GPUs on commodity CPUs, making it practical for large-scale cluster simulations without requiring expensive hardware.
New Use Cases for Enterprise Deployment
According to the pre-print, Frontier enables several new use cases that directly benefit enterprise IT decision-makers:
- SLA-dependent Pareto frontier exploration: helps balance service-level agreements with cost.
- Heterogeneous disaggregated allocation: optimizes placement of different GPU types.
- Agentic reasoning scheduling validation: tests scheduling strategies for autonomous agent workloads.
- RL post-training reconfiguration: simulates changes in reinforcement learning training setups.
These capabilities allow CTOs and infrastructure teams to simulate and validate serving architectures before committing to hardware purchases or configuration changes, reducing risk and improving resource efficiency.
The simulator is released as open source, enabling the broader community to adopt and extend it. By providing accurate, generalizable predictions of computation, communication, and memory costs across diverse serving scenarios, Frontier addresses a critical gap in the LLM deployment toolchain.