What question did this study set out to answer?

The study aims to improve video quality assessment by addressing limitations in evaluating extreme quality samples and sensitivity to perceptual nuances.

January 26, 2026

RAM-VQA: Restoration Assisted Multi-modality Video Quality Assessment

Key Points

The study aims to improve video quality assessment by addressing limitations in evaluating extreme quality samples and sensitivity to perceptual nuances.
Developed the RAM-VQA framework leveraging video restoration for distortion-sensitive feature modeling.
Implemented a prompt learning stage to create a quality-aware textual space from degraded, restored, and pristine references.
Conducted a dual-branch evaluation integrating semantic cues with technical quality indicators through spatio-temporal analysis.
RAM-VQA achieved state-of-the-art performance on various benchmarks, particularly excelling in extreme-quality content evaluation.
It demonstrated robust generalization capabilities compared to existing methodologies.

Abstract

Video Quality Assessment (VQA) strives to computationally emulate human perceptual judgments and has garnered significant attention given its widespread applicability. However, existing methodologies face two primary impediments: (1) limited proficiency in evaluating samples at quality extremes (e.g., severely degraded or near-perfect videos), and (2) insufficient sensitivity to nuanced quality variations arising from a misalignment with human perceptual mechanisms. Although vision-language models offer promising semantic understanding, their reliance on visual encoders pre-trained for high-level tasks often compromises their sensitivity to low-level distortions. To surmount these challenges, we propose the Restoration-Assisted Multi-modality VQA (RAM-VQA) framework. Uniquely, our approach leverages video restoration as a proxy to explicitly model distortion-sensitive features. The framework operates through two synergistic stages: a prompt learning stage that constructs a quality-aware textual space using triple-level references (degraded, restored, and pristine) derived from the restoration process, and a dual-branch evaluation stage that integrates semantic cues with technical quality indicators via spatio-temporal differential analysis. Extensive experiments demonstrate that RAM-VQA achieves state-of-the-art performance across diverse benchmarks, exhibiting superior capability in handling extreme-quality content while ensuring robust generalization.

اسأل الذكاء الاصطناعي

Bookmark

اسأل الذكاء الاصطناعي

Bookmark

RAM-VQA: Restoration Assisted Multi-modality Video Quality Assessment

Key Points

Abstract

Cite This Study