In this study, we propose a novel approach for human pose estimation (HPE) in occluded scenes by progressively fusing features extracted from RGB-D images, which contain RGB and depth images. Conventional bottom-up human pose estimation models that rely solely on RGB inputs often produce erroneous skeletons when parts of a person’s body are obscured by another individual, because they struggle to accurately infer body connectivity due to the lack of 3D topological information. To address this limitation, we modify the traditional OpenPose that is a bottom-up HPE model to take a depth image as an additional input, thereby providing explicit 3D spatial cues. Each input modality is processed by a dedicated feature extractor. Each input modality is processed by a dedicated feature extractor. In addition to the two existing modules for each stage—joint connectivity and joint confidence map estimations for the color image—we integrate a new module for estimating joint confidence maps for the depth image into the initial few stages. Subsequently, the confidence maps derived from both depth and RGB modalities are fused at each stage and forwarded to the next, ensuring that 3D topological information from the depth image is effectively utilized for both joint localization and body part association. Subsequently, the confidence maps derived from both depth and RGB modalities are fused at each stage and forwarded to the next to ensure that 3D topological information is effectively utilized for estimating both joint localization and their connectivity. The experimental results on the NTU 120+ RGB-D Dataset verify that our proposed approach achieves a 13.3% improvement in average recall compared to the original OpenPose model. The proposed method can enhance the performance of the bottom-up HPE models for the occlusion scenes.
Yoon et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: