Key points are not available for this paper at this time.
Human pose estimation has emerged as a critical problem in computer vision due to its extensive applications across interdisciplinary fields, including robotics, augmented reality, sports analysis, and biomechanics. Traditional methods, while effective in controlled environments, often fail to generalize to real-world scenarios due to challenges such as occlusions, scale variations, and temporal inconsistencies in video data. To address these limitations, we propose the Hierarchical Spatio-Temporal Pose Network (HSTPN), a deep learning-based framework that integrates multi-scale feature fusion with attention mechanisms to capture both global context and fine-grained details. The Adaptive Pose Refinement Strategy (APRS) enhances pose predictions by iteratively refining key point locations, leveraging spatial, temporal, and domain-specific constraints. Together, these innovations enable our approach to achieve superior accuracy and robustness across diverse datasets, including both constrained and unconstrained environments. Experimental results demonstrate that HSTPN and APRS outperform state-of-the-art methods in terms of prediction accuracy, temporal coherence, and computational efficiency, making them well-suited for real-time and interdisciplinary physics applications.
Zhuo Li (Mon,) studied this question.