Artificial Intelligence #policy regret#embedding model
Low-Policy-Regret Algorithm for Embedding Model Routing in Contextual Bandits
A new paper on arXiv formalizes embedding model routing as an adversarial contextual linear bandit problem. The authors propose Hypentropy Policy Gradient (HPG), which provably adapts to unknown low-rank structure and attains low linearized policy regret.
Jun 16, 2026 1 source