Topic
quality assessment
New Multi-Scale Two-Stream Framework Aims to Decouple Semantics from Distortions in AI-Generated Image Quality Assessment
Researchers introduce MST-CLIPIQA, a multi-scale two-stream vision-language framework that decouples semantic understanding from distortion detection to improve AI-generated image quality assessment. The method uses dual CLIP encoders and an information bottleneck gated fusion mechanism, achieving state-of-the-art results on five benchmarks with only 0.8 million trainable parameters.
NVMOS: Novel AI Model Predicts Perceptual Quality of Non-Verbal Vocalizations in Speech
A new paper on arXiv introduces NVMOS, the first model purpose-built to assess the perceptual quality of non-verbal vocalizations (NVs) such as laughter, sighs, and coughs in speech. The model was trained on a newly constructed NV-MOS dataset with expert ratings and achieves expert-level agreement with human Mean Opinion Scores. Tests on multimodal LLMs like Gemini showed clear inconsistencies, highlighting the need for specialized NV quality assessment.