Video Quality Assessment (VQA) strives to computationally emulate human perceptual judgments and has garnered significant attention given its widespread applicability. However, existing methodologies face two primary impediments: (1) limited proficiency in evaluating samples at quality extremes (e.g., severely degraded or near-perfect videos), and (2) insufficient sensitivity to nuanced quality variations arising from a misalignment with human perceptual mechanisms. Although vision-language models offer promising semantic understanding, their reliance on visual encoders pre-trained for high-level tasks often compromises their sensitivity to low-level distortions. To surmount these challenges, we propose the Restoration-Assisted Multi-modality VQA (RAM-VQA) framework. Uniquely, our approach leverages video restoration as a proxy to explicitly model distortion-sensitive features. The framework operates through two synergistic stages: a prompt learning stage that constructs a quality-aware textual space using triple-level references (degraded, restored, and pristine) derived from the restoration process, and a dual-branch evaluation stage that integrates semantic cues with technical quality indicators via spatio-temporal differential analysis. Extensive experiments demonstrate that RAM-VQA achieves state-of-the-art performance across diverse benchmarks, exhibiting superior capability in handling extreme-quality content while ensuring robust generalization.
Building similarity graph...
Analyzing shared references across papers
Loading...
Pengfei Chen
Jiebin Yan
Rajiv Soundararajan
IEEE Transactions on Image Processing
Centre National de la Recherche Scientifique
Université Paris-Saclay
Indian Institute of Science Bangalore
Building similarity graph...
Analyzing shared references across papers
Loading...
Chen et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69770370722626c4468e8853 — DOI: https://doi.org/10.1109/tip.2026.3655117