Artificial Intelligence #labosbench#benchmarking
LabOSBench: New Benchmark Tests AI Agents on Complex Scientific Instrument Control
LabOSBench is a new benchmark designed to evaluate computer-use agents on scientific instrument control. It features 96 subtasks across eight simulated instruments, testing agents on sample loading, alignment, parameter tuning, data acquisition, and result inspection. Early results show that while agents handle structured GUI tasks well, they struggle with feedback-driven operations and long-horizon workflows.
Jun 16, 2026 1 source