To meet the demand for efficient skeletal keypoint recognition in outdoor physical training assessment, this study introduces a lightweight Cross Stage Partial Poseur Network (CSP-Poseur). The proposed model enhances the original Poseur backbone through an improved Cross Stage Partial (CSP) structure that minimizes parameter redundancy. It further integrates a Convolutional Block Attention Module (CBAM) and a Gated Attention Unit (GAU) to strengthen feature discrimination in key joint regions and improve adaptability to complex environments. Experiments show that CSP-Poseur achieves superior performance across multiple datasets. On the COCO dataset, it attains a mean Average Precision (mAP) of 76.7%, with AP50 of 91.3% and AP75 of 83.9%, outperforming the baseline Poseur by 1.72%, 0.88%, and 2.07%, respectively. On the MPII dataset, it reaches an mAP of 90.9%, exceeding Poseur by 0.44%. Despite these gains, the model remains highly efficient, requiring only 14.9M parameters and 1.18G FLOPs, both considerably lower than mainstream approaches. Ablation studies verify that CBAM and GAU significantly enhance skeletal keypoint modeling, while experiments on the decoding structure reveal that a four-layer decoder offers the best balance between accuracy and computational cost. Overall, CSP-Poseur achieves an effective trade-off between precision and efficiency, making it well-suited for real-time pose estimation and training action evaluation on edge devices.
Li et al. (Thu,) studied this question.