In regression-based 2D human pose estimation, accurate keypoint localization in crowded and occluded scenes remains challenging due to insufficient modeling of structural dependencies among joints. To address this issue, this paper proposes MACS-Pose, a topological-consistency-aware framework for robust pose estimation. The proposed method systematically incorporates topology-consistency cues into feature representation, semantic propagation, and regression supervision. Specifically, a Hierarchical Aggregation Multi-branch Network (HAMANet) is designed to jointly capture local appearance details and global structural semantics. A Cross-Stage Semantic Enhancement Stage (CSSE-Stage) is introduced to alleviate semantic degradation during deep feature transmission. Furthermore, an Adaptive Skeleton-aware Keypoint Regression Loss (A-SKE Loss) is developed to enforce skeletal topology consistency during coordinate regression. Experimental results on the COCO 2017 and MPII datasets demonstrate that MACS-Pose consistently outperforms representative regression-based methods. Compared with YOLOv11s-Pose, it improves AP from 68.9% to 73.3% and AR from 76.9% to 80.2% on COCO 2017, while achieving 90.4% PCKh@0.5 on MPII. With 16.8 M parameters and real-time inference capability, the proposed method achieves a favorable balance between accuracy and efficiency, showing strong potential for resource-constrained vision applications.
Hao et al. (Sat,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: