A properly calibrated rule-based autoscaler can beat every one of six mainstream deep reinforcement learning (DRL) algorithms on cost across every workload tested, according to a new benchmark study published on arXiv. The research, led by Zhang, Guilin, Sun, Chuanyi, Zhao, Kai, Xu, Sarkani, Shahryar, and Fossaceca, John, introduces RLScale-Bench, a reproducible benchmark and evaluation protocol for DRL on adaptive resource control—where an agent allocates compute to a dynamic workload under cost and service-level constraints.
The study evaluates six widely used DRL algorithms—PPO, DQN, A2C, SAC, TD3, and DDPG—under matched architectures, training budgets, and reward functions. These are compared against a carefully calibrated rule-based baseline across six workload patterns and five seeds, for a total of 240 runs. The benchmark is instantiated on Kubernetes Horizontal Pod Autoscaling, a common enterprise infrastructure task.
Three Key Findings
The research reports three principal findings that challenge common assumptions in the field:
- Calibrated controller beats DRL on cost: The calibrated rule-based controller achieves the lowest cost on all six workloads. However, it trails the best RL agents on bursty and flash traffic patterns.
- Discrete vs. continuous action spaces: Discrete-action algorithms outperform continuous-action ones by one to two orders of magnitude in constraint violations, due to action-space mismatch.
- No single dominant algorithm: No single DRL algorithm dominates across workloads; rankings shift by up to four positions depending on the workload pattern.
Benchmark Setup
RLScale-Bench is designed to provide a fair and reproducible comparison. The researchers matched training budgets, neural network architectures, and reward functions across all algorithms. The rule-based baseline was carefully calibrated to ensure it represented a strong, practical competitor. The workloads included six distinct patterns, and each run used five different random seeds to account for variability.
Implications for Enterprise Infrastructure
For enterprise technology leaders managing cloud costs and performance, the findings suggest that before investing in DRL for adaptive resource control, organizations should first ensure they have a well-tuned rule-based system. The bottleneck in RL-based resource control, according to the paper, is not algorithm selection but baseline calibration, reward engineering, and realistic evaluation protocols. Discrete-action DRL algorithms may be more suitable for environments with strict service-level constraints, given their lower violation rates.
Competitive Context
While the study does not name commercial products, the instantiation on Kubernetes Horizontal Pod Autoscaling positions RLScale-Bench as relevant to any cloud-native platform. Tools like Kubernetes HPA are widely used; this research provides a rigorous framework to evaluate whether replacing them with DRL-based controllers is worthwhile.
The full paper, accessible at https://arxiv.org/abs/2605.26418, includes details on the action spaces, workload generators, and validation methodology. It is licensed under a Creative Commons Attribution 4.0 International License.