What question did this study set out to answer?

To improve action recognition in freestyle wrestling using a novel dataset and deep learning techniques.

March 25, 2026Open Access

A CNN–Bi-LSTM pipeline and open FSW dataset for freestyle wrestling action recognition

Key Points

To improve action recognition in freestyle wrestling using a novel dataset and deep learning techniques.
Developed a dataset of 210 clips covering seven wrestling techniques.
Employed a DeepLabV3+ model for foreground segmentation of athletes.
Extracted features with CNN backbones including VGG16, InceptionV3, and EfficientNet-B7.
Used a Bi-LSTM for aggregating features and producing predictions.
Applied group-aware six-fold cross-validation to evaluate model performance.
Achieved 82.9% top-1 accuracy with the best model configuration.
Demonstrated consistent performance improvements with foreground segmentation.
Highlighted significant gains in recognizing high-occlusion techniques.

Abstract

Human action recognition in close-contact sports is hindered by mutual occlusion, rapid pose changes, and distracting backgrounds. We study freestyle wrestling—a representative close-contact setting with sustained physical interaction—and present the Open FSW dataset of 210 trimmed clips covering seven techniques (30 clips per class), sourced from both controlled training sessions and broadcast footage. We introduce a foreground-aware RGB pipeline that segments athletes with a fine-tuned DeepLabV3+ model, extracts per-frame features using CNN backbones (VGG16, InceptionV3, EfficientNet-B7), and aggregates them with a bidirectional LSTM to produce clip-level predictions. Under a group-aware six-fold cross-validation protocol stratified by match/session ID to reduce train–test contamination across related sequences, the best configuration (DeepLabV3+ (foreground) + EfficientNet-B7 + Bi-LSTM) attains 82.9% top-1 accuracy. Ablation results quantify the added value of foregrounding, showing consistent gains for the strongest backbone and the largest improvements on high-occlusion techniques, at the cost of additional inference latency due to segmentation. Due to the modest dataset size, we mitigate overfitting via transfer learning and extensive augmentation, and we frame conclusions as domain-specific to freestyle wrestling. The dataset and code are released. To comply with copyright constraints, the controlled subset is provided as processed clips, while the broadcast subset is released as annotations and clip metadata to enable reconstruction.

Bookmark

View Full Paper

Cite This Study

Rostamian et al. (Mon,) studied this question.

synapsesocial.com/papers/69c37adcb34aaaeb1a67cd29 https://doi.org/https://doi.org/10.1038/s41598-026-44782-0

Bookmark

View Full Paper