The video broadcasting of sports events leads daily to a mass of data. In this work, we are interested in the recognition of action phases (versus non-action, e.g. timeout) in videos of amateur sports competitions. The automation of this task can be related to the classification of actions from videos where the state of the art relies on the use of 3D deep neural approaches. Although providing good performances in the context of professional sports and offline analysis, deploying these approaches in an amateur/live processing setting can be problematic. In this study, we investigate the interest of an approach relying on intermediate local planar representations capturing both spatial and temporal information from the input videos. These representations are compatible with lightweight 2D neural models that are optimized for video classification. We evaluate the performances of the proposed approach combined with different visual backbones such as traditional convolutional neural networks (e.g., ResNet, MobileNet) as well as Transformer-based architectures and we compare our results with different baseline methods (2D and 3D models), on a dataset of 1000 annotated videos from amateur football competitions. Furthermore, we demonstrate the generalization capabilities of our method by evaluating it on a different sport (rugby) in a zero-shot setting. The results obtained highlight the interest of the proposed approach for sport competition videos, with acceptable performance and computation times allowing scaling in amateur/live processing settings. • Lightweight 2D neural models classify action phases in amateur sports videos. • Local planar intermediate representations capture spatial–temporal information. • Random-walks preserve spatiality and improve training with scarce labels. • Real-time inference achieved for live (soccer) sports broadcasting.
Building similarity graph...
Analyzing shared references across papers
Loading...
Lazhar Bouacha
Université Paris Cité
Thierry Magnien
Château de Longchamp
Laurent Wendling
Computer Vision and Image Understanding
Université Paris Cité
Château de Longchamp
Building similarity graph...
Analyzing shared references across papers
Loading...
Bouacha et al. (Wed,) studied this question.
synapsesocial.com/papers/69e07c1e2f7e8953b7cbd973 — DOI: https://doi.org/10.1016/j.cviu.2026.104748