Human pose estimation in crowded images remains difficult because the visual evidence around many joints is incomplete, and responses from nearby persons may be mistakenly incorporated into the target skeleton. To address this issue, this paper presents CSPA-Net, a heatmap-based pose estimation framework that controls the propagation of structural information during occluded-joint recovery. The proposed network first estimates joint reliability from coarse heatmaps by considering both the dispersion and the spatial spread of the response distribution. Based on these soft joint locations and uncertainty cues, a Skeleton-consistent Manhattan Constraint is constructed to define a target-oriented spatial prior. This prior limits structural propagation to regions that are more consistent with the estimated target skeleton, reducing the chance of introducing features from adjacent instances. In addition, a Pose-Structured Cross-Axis Attention module is designed to exchange row-wise and column-wise contextual information so that lateral body symmetry and vertical kinematic dependencies can be modeled in a more directed manner. Finally, multiscale adaptive aggregation combines coarse structural cues with fine local details for heatmap prediction. Experiments on COCO val2017 and CrowdPose show that CSPA-Net achieves 75.3% AP and 80.9% AR on COCO val2017 and 69.6% AP on the CrowdPose test set, outperforming the HRNet-W32 baseline under the same input setting. These results suggest that controlled structural propagation is useful for improving pose estimation in occluded and crowded scenes.
Li et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: