What question did this study set out to answer?

The study aims to recognize action phases in amateur sports competition videos using an effective automated approach.

April 16, 2026Open Access

Recognition of action phases from spatio-temporal intermediate representations in amateur sports competition videos

Key Points

The study aims to recognize action phases in amateur sports competition videos using an effective automated approach.
Developed a method based on local planar intermediate representations capturing spatial and temporal information.
Utilized lightweight 2D neural models optimized for video classification.
Evaluated the proposed method across different visual backbones, including convolutional networks and Transformer architectures.
Assessed performance using 1000 annotated videos from amateur football competitions and conducted zero-shot evaluation on rugby.
The proposed approach demonstrated acceptable performance in recognizing action phases.
Achieved real-time inference capabilities for live sports broadcasting.
Highlighted the benefits of local planar representations for improving classification in settings with scarce labels.

Abstract

The video broadcasting of sports events leads daily to a mass of data. In this work, we are interested in the recognition of action phases (versus non-action, e.g. timeout) in videos of amateur sports competitions. The automation of this task can be related to the classification of actions from videos where the state of the art relies on the use of 3D deep neural approaches. Although providing good performances in the context of professional sports and offline analysis, deploying these approaches in an amateur/live processing setting can be problematic. In this study, we investigate the interest of an approach relying on intermediate local planar representations capturing both spatial and temporal information from the input videos. These representations are compatible with lightweight 2D neural models that are optimized for video classification. We evaluate the performances of the proposed approach combined with different visual backbones such as traditional convolutional neural networks (e.g., ResNet, MobileNet) as well as Transformer-based architectures and we compare our results with different baseline methods (2D and 3D models), on a dataset of 1000 annotated videos from amateur football competitions. Furthermore, we demonstrate the generalization capabilities of our method by evaluating it on a different sport (rugby) in a zero-shot setting. The results obtained highlight the interest of the proposed approach for sport competition videos, with acceptable performance and computation times allowing scaling in amateur/live processing settings. • Lightweight 2D neural models classify action phases in amateur sports videos. • Local planar intermediate representations capture spatial–temporal information. • Random-walks preserve spatiality and improve training with scarce labels. • Real-time inference achieved for live (soccer) sports broadcasting.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Lazhar Bouacha

Université Paris Cité

Thierry Magnien

Château de Longchamp

Laurent Wendling

Journals

Computer Vision and Image Understanding

Actions

Institutions

Université Paris Cité

Château de Longchamp

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Recognition of action phases from spatio-temporal intermediate representations in amateur sports competition videos

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study