What question did this study set out to answer?

The aim is to improve 2D human pose estimation by accurately localizing joints even in crowded and occluded environments.

May 13, 2026Open Access

MACS-Pose: Topological-Consistency-Aware Regression for 2D Human Pose Estimation

Key Points

The aim is to improve 2D human pose estimation by accurately localizing joints even in crowded and occluded environments.
Developed MACS-Pose framework incorporating topology-consistency cues into pose estimation.
Utilized Hierarchical Aggregation Multi-branch Network (HAMANet) for capturing details and semantics.
Introduced Adaptive Skeleton-aware Keypoint Regression Loss for enforcing skeletal topology consistency.
Achieved 73.3% AP and 80.2% AR on COCO 2017, significant improvements from 68.9% AP and 76.9% AR, respectively.
Attained 90.4% PCKh@0.5 on MPII dataset, indicating effective joint localization.
Demonstrated a real-time inference capability with 16.8 M parameters, balancing accuracy and efficiency.

Abstract

In regression-based 2D human pose estimation, accurate keypoint localization in crowded and occluded scenes remains challenging due to insufficient modeling of structural dependencies among joints. To address this issue, this paper proposes MACS-Pose, a topological-consistency-aware framework for robust pose estimation. The proposed method systematically incorporates topology-consistency cues into feature representation, semantic propagation, and regression supervision. Specifically, a Hierarchical Aggregation Multi-branch Network (HAMANet) is designed to jointly capture local appearance details and global structural semantics. A Cross-Stage Semantic Enhancement Stage (CSSE-Stage) is introduced to alleviate semantic degradation during deep feature transmission. Furthermore, an Adaptive Skeleton-aware Keypoint Regression Loss (A-SKE Loss) is developed to enforce skeletal topology consistency during coordinate regression. Experimental results on the COCO 2017 and MPII datasets demonstrate that MACS-Pose consistently outperforms representative regression-based methods. Compared with YOLOv11s-Pose, it improves AP from 68.9% to 73.3% and AR from 76.9% to 80.2% on COCO 2017, while achieving 90.4% PCKh@0.5 on MPII. With 16.8 M parameters and real-time inference capability, the proposed method achieves a favorable balance between accuracy and efficiency, showing strong potential for resource-constrained vision applications.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper