Artificial Intelligence #benchmark#text-to-video
BRITE Benchmark Reveals Critical Gaps in Text-to-Video Models' Object-Action Binding and Audio-Visual Sync
A new benchmark called BRITE provides the first unified framework for evaluating text-to-video (T2V) models on implausible prompts, audio-visual consistency, and interpretable QA-based assessment. Testing five state-of-the-art models including Sora 2 and Veo 3.1, BRITE reveals that while models excel at static object composition, they show significant degradation in object-action binding and audio-visual synchronization.
Jun 16, 2026 1 source