Action Quality Assessment (AQA) has recently gained prominence as a crucial area of research, driven by advancements in deep learning and increasing demands for automated, objective evaluations of human action execution. Unlike traditional action recognition tasks, AQA involves a nuanced analysis aimed at quantifying performance quality, thereby enabling practical applications across diverse domains such as sports evaluation, rehabilitation monitoring, and professional skill assessment. This survey systematically reviews deep learning-based methodologies developed for video-based AQA, categorizing existing approaches according to data modalities, learning paradigms, and evaluation granularity. Specifically, the review covers single-modality and emerging multi-modal approaches integrating visual, auditory, and textual information, as well as supervised, self-supervised, and contrastive learning frameworks. Key benchmark datasets utilized in AQA research are comprehensively analyzed with emphasis on their scope, representativeness, and annotation characteristics. Furthermore, critical challenges confronting the field-including limited model generalization, ambiguity in scoring annotations, and interpretability concerns-are identified and discussed. Potential future research directions that could address these limitations and further advance practical AQA deployment are also proposed.
Zhang et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: