Artificial Intelligence #retailbench#llm
RetailBench Benchmark Tests LLM Agents on Long-Horizon Retail Decisions
Researchers introduced RetailBench, a simulation benchmark for evaluating LLM agents in single-store supermarket management over 180 days. Tests on seven models showed only a subset completed the full horizon, and even the best fell far behind an oracle policy due to incomplete evidence acquisition and lack of consistent strategy.
Jun 16, 2026 2 sources