When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control

A research paper introduces RLScale-Bench, a reproducible benchmark for deep reinforcement learning on adaptive resource control. Testing six DRL algorithms and a calibrated rule-based baseline on Kubernetes autoscaling across six workload patterns, the study finds that the calibrated controller achieves the lowest cost on all workloads, though DRL agents perform better on bursty and flash traffic. Discrete-action DRL algorithms also significantly outperform continuous-action ones in constraint violations.

iGEN Editorial

June 16, 2026

When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control

A properly calibrated rule-based autoscaler can beat every one of six mainstream deep reinforcement learning (DRL) algorithms on cost across every workload tested, according to a new benchmark study published on arXiv. The research, led by Zhang, Guilin, Sun, Chuanyi, Zhao, Kai, Xu, Sarkani, Shahryar, and Fossaceca, John, introduces RLScale-Bench, a reproducible benchmark and evaluation protocol for DRL on adaptive resource control—where an agent allocates compute to a dynamic workload under cost and service-level constraints.

The study evaluates six widely used DRL algorithms—PPO, DQN, A2C, SAC, TD3, and DDPG—under matched architectures, training budgets, and reward functions. These are compared against a carefully calibrated rule-based baseline across six workload patterns and five seeds, for a total of 240 runs. The benchmark is instantiated on Kubernetes Horizontal Pod Autoscaling, a common enterprise infrastructure task.

Three Key Findings

The research reports three principal findings that challenge common assumptions in the field:

Calibrated controller beats DRL on cost: The calibrated rule-based controller achieves the lowest cost on all six workloads. However, it trails the best RL agents on bursty and flash traffic patterns.
Discrete vs. continuous action spaces: Discrete-action algorithms outperform continuous-action ones by one to two orders of magnitude in constraint violations, due to action-space mismatch.
No single dominant algorithm: No single DRL algorithm dominates across workloads; rankings shift by up to four positions depending on the workload pattern.

Benchmark Setup

RLScale-Bench is designed to provide a fair and reproducible comparison. The researchers matched training budgets, neural network architectures, and reward functions across all algorithms. The rule-based baseline was carefully calibrated to ensure it represented a strong, practical competitor. The workloads included six distinct patterns, and each run used five different random seeds to account for variability.

Implications for Enterprise Infrastructure

For enterprise technology leaders managing cloud costs and performance, the findings suggest that before investing in DRL for adaptive resource control, organizations should first ensure they have a well-tuned rule-based system. The bottleneck in RL-based resource control, according to the paper, is not algorithm selection but baseline calibration, reward engineering, and realistic evaluation protocols. Discrete-action DRL algorithms may be more suitable for environments with strict service-level constraints, given their lower violation rates.

Competitive Context

While the study does not name commercial products, the instantiation on Kubernetes Horizontal Pod Autoscaling positions RLScale-Bench as relevant to any cloud-native platform. Tools like Kubernetes HPA are widely used; this research provides a rigorous framework to evaluate whether replacing them with DRL-based controllers is worthwhile.

The full paper, accessible at https://arxiv.org/abs/2605.26418, includes details on the action spaces, workload generators, and validation methodology. It is licensed under a Creative Commons Attribution 4.0 International License.

Sources:

When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control

Three Key Findings

Benchmark Setup

Implications for Enterprise Infrastructure

Competitive Context

Recommended Stories

DRFLOW Benchmark Targets Personalized Workflow Prediction for Enterprise AI Agents

MEAL Benchmark Enables Continuous Multi-Agent RL Training on 100 Tasks in Hours Using GPU Acceleration

MA-ProofBench: New Benchmark Tests LLMs on Formal Theorem Proving in Mathematical Analysis

New Benchmark ARB4WM Evaluates Adversarial Robustness of World Models for Safety-Critical Control