Key points are not available for this paper at this time.
Action quality assessment is a more challenging topic than action recognition because of its requirement for models to assess quality through fine-grained differences in actions. Current mainstream approaches formalize the problem as a regression task based on video spatio-temporal features. However, most previous methods ignore that motion performance at different stages or time points may have different importance in action quality assessment. In this regard, we propose an action quality assessment method using a multi-scale temporal attention mechanism to assign appropriate weights to different time steps through the temporal attention mechanism. In addition, to address the issues of video feature fusion and subjective noise in the AQA dataset, LSTM-like MLP structures and smooth labeling strategies were applied respectively. Compared to the current state-of-the-art method, CoRe, we improved 1.84% on the AQA-7 dataset and 0.91% on the MTL-AQA dataset.
Wang et al. (Fri,) studied this question.