RAMS: Resource-Adaptive Model Switching for Embedded Edge Perception Under Load

Researchers present RAMS, a runtime controller that monitors device pressure and dynamically selects among three YOLOv8 tiers on embedded hardware, achieving up to 5.6x faster inference than a fixed medium model while retaining 74% of its accuracy. The system introduces a detection-conditioned switching policy and a new scalar metric, SWAS, for offline policy comparison.

iGEN Editorial

June 16, 2026

RAMS: Resource-Adaptive Model Switching for Embedded Edge Perception Under Load

Edge object detection on embedded hardware faces a fundamental trade-off: inference latency must stay low under fluctuating resource pressure while detection quality is maintained. A team of researchers from arXiv (Kushal Khemani, Evan Leri, George Xu, and Amit Hod) has proposed RAMS (Resource-Adaptive and Detection-Conditioned Model Switching), a lightweight runtime controller that dynamically selects among three resident YOLOv8 tiers without model-reload latency. The approach targets any deployment where CPU, memory or power constraints change unpredictably — from robotics to autonomous vehicles.

How RAMS Works

RAMS monitors device pressure (e.g., CPU load, memory usage) and calibrates switching thresholds based on idle behaviour. It manages three YOLOv8 models: NANO (320×320 px), SMALL (416×416 px), and MEDIUM (640×640 px). Five switching policies are defined, including two detection-conditioned variants that prevent aggressive model downgrades when recent vulnerable-road-user (VRU) detections have occurred. The same controller equations operate across a 37× latency range, as demonstrated on Raspberry Pi 5, x86 laptops, and Jetson Orin with ONNX and TensorRT runtimes.

The VRU-Weighted Accuracy Score (SWAS)

To compare policies offline without ground-truth annotations, the team introduced the VRU-Weighted Accuracy Score (SWAS), a scalar metric that weights detection accuracy by the presence of vulnerable road users. An oracle-bounded variant of SWAS separates the circularity of the detector’s own outputs from genuine tier-retention benefit. Under heavy load on Jetson Orin TensorRT, detection-conditioned switching improved SWAS by 25.4% (oracle scoring) and 47.3% (detector-derived scoring) relative to threshold-only policies.

Performance Benchmarks

Policy	Mean Latency (ms)	Relative Speedup vs Fixed-MEDIUM	Proxy Accuracy Retained	SWAS Improvement (detector-derived)
Threshold-only	Not reported	—	—	Baseline
safety2 (detection-conditioned)	3.41	5.6× faster	74%	+47.3%

On the Jetson Orin under heavy load, the safety2 policy achieved a mean latency of 3.41 ms, 5.6× faster than fixed-MEDIUM inference, while retaining 74% of its proxy accuracy through near-NANO operation with selective SMALL and MEDIUM locks during VRU-positive windows. Live evaluation on the KITTI dataset reported per-tier VRU recall rates of 24.2% for NANO, 41.2% for SMALL, and 59.0% for MEDIUM, indicating that reactive overrides are fundamentally limited by the baseline detector’s recall.

Implications for Embedded Perception

The ability to switch models at runtime without reload latency and to condition decisions on prior detections offers a practical path for embedded vision systems that must operate under uncertain resource budgets. While the current work focuses on autonomous driving scenarios (VRU detection), the same architecture applies to any edge perception task where latency and accuracy must be balanced dynamically. Future work could extend the policy set or integrate real-time resource forecasting.

Sources:

RAMS: Resource-Adaptive Model Switching for Embedded Edge Perception Under Load

How RAMS Works

The VRU-Weighted Accuracy Score (SWAS)

Performance Benchmarks

Implications for Embedded Perception

Recommended Stories

DH-V2: Geometry-Based Sampler Achieves 1,433x Compression for Edge Perception

New Research Reveals How Visual Tokens Evolve Inside Vision-Language Models

LLM Paraphrase Augmentation Boosts Sign Language Translation Performance

New AI Research Shows Vision-Language Models Think Better with Visual Grounding