Robust 2D human pose estimation remains challenging due to occlusion and background interference, which introduce substantial uncertainty into visual representations. This paper proposes PMNet, a Parallel Modeling Network that integrates explicit graph-based structural modeling and implicit self-attention-based semantic modeling through parallel pathways to jointly capture local dependencies and global contextual relationships among keypoints. From an information-theoretic perspective, occlusion and clutter can be interpreted as sources of increased representational entropy, and PMNet addresses this issue by progressively reducing uncertainty through complementary structural reasoning and attention-based information selection. The framework incorporates a criss-cross attention module to suppress irrelevant features, an adaptive nonlinear fusion strategy to balance complementary information across parallel branches, and an error-compensated decoding method to sharpen heatmap distributions and refine keypoint localization while maintaining efficiency. Extensive experiments on the MPII and COCO benchmarks demonstrate that PMNet achieves state-of-the-art or comparable performance, attaining 92.42% PCKh@0.5 on MPII and 77.3% AP on COCO. Ablation studies and qualitative visualizations further confirm the effectiveness of each component, showing improved signal-to-noise ratios and more concentrated heatmap responses. Overall, PMNet provides a robust and efficient pose estimation framework with strong potential for real-world applications such as surveillance and autonomous systems.
Zhao et al. (Sat,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: