Edge object detection on embedded hardware faces a fundamental trade-off: inference latency must stay low under fluctuating resource pressure while detection quality is maintained. A team of researchers from arXiv (Kushal Khemani, Evan Leri, George Xu, and Amit Hod) has proposed RAMS (Resource-Adaptive and Detection-Conditioned Model Switching), a lightweight runtime controller that dynamically selects among three resident YOLOv8 tiers without model-reload latency. The approach targets any deployment where CPU, memory or power constraints change unpredictably — from robotics to autonomous vehicles.
How RAMS Works
RAMS monitors device pressure (e.g., CPU load, memory usage) and calibrates switching thresholds based on idle behaviour. It manages three YOLOv8 models: NANO (320×320 px), SMALL (416×416 px), and MEDIUM (640×640 px). Five switching policies are defined, including two detection-conditioned variants that prevent aggressive model downgrades when recent vulnerable-road-user (VRU) detections have occurred. The same controller equations operate across a 37× latency range, as demonstrated on Raspberry Pi 5, x86 laptops, and Jetson Orin with ONNX and TensorRT runtimes.
The VRU-Weighted Accuracy Score (SWAS)
To compare policies offline without ground-truth annotations, the team introduced the VRU-Weighted Accuracy Score (SWAS), a scalar metric that weights detection accuracy by the presence of vulnerable road users. An oracle-bounded variant of SWAS separates the circularity of the detector’s own outputs from genuine tier-retention benefit. Under heavy load on Jetson Orin TensorRT, detection-conditioned switching improved SWAS by 25.4% (oracle scoring) and 47.3% (detector-derived scoring) relative to threshold-only policies.
Performance Benchmarks
| Policy | Mean Latency (ms) | Relative Speedup vs Fixed-MEDIUM | Proxy Accuracy Retained | SWAS Improvement (detector-derived) |
|---|---|---|---|---|
| Threshold-only | Not reported | — | — | Baseline |
| safety2 (detection-conditioned) | 3.41 | 5.6× faster | 74% | +47.3% |
On the Jetson Orin under heavy load, the safety2 policy achieved a mean latency of 3.41 ms, 5.6× faster than fixed-MEDIUM inference, while retaining 74% of its proxy accuracy through near-NANO operation with selective SMALL and MEDIUM locks during VRU-positive windows. Live evaluation on the KITTI dataset reported per-tier VRU recall rates of 24.2% for NANO, 41.2% for SMALL, and 59.0% for MEDIUM, indicating that reactive overrides are fundamentally limited by the baseline detector’s recall.
Implications for Embedded Perception
The ability to switch models at runtime without reload latency and to condition decisions on prior detections offers a practical path for embedded vision systems that must operate under uncertain resource budgets. While the current work focuses on autonomous driving scenarios (VRU detection), the same architecture applies to any edge perception task where latency and accuracy must be balanced dynamically. Future work could extend the policy set or integrate real-time resource forecasting.