Artificial Intelligence #toolmenubench#benchmarking
ToolMenuBench: New Benchmark Evaluates Tool-Menu Filtering for Reliable and Efficient LLM Agents
ToolMenuBench, a new benchmark from researchers, evaluates how tool-menu filtering strategies affect LLM agent reliability and efficiency. In tests across seven model backends, causal minimal tool filtering improved task success from 32.1% to 85.7% while reducing token usage by roughly 98%.
Jun 16, 2026 2 sources