Conventional deep learning-based estimators often exhibit limited robustness when exposed to occluded inputs, as their training distributions are biased toward fully visible or minimally occluded human poses. To address this generalization gap, a novel framework, ORPESO (Occlusion Robust Pose Estimation via Synthetic Occlusion), is introduced, explicitly incorporating occlusion diversity into both the learning and inference stages. The proposed method constructs a comprehensive synthetic occlusion space during training by augmenting clean 3D pose sequences with varied and structured occlusion patterns, encouraging the model to learn multimodal spatiotemporal pose representations resilient to partial visibility. Additionally, ORPESO leverages a test-time adaptive recalibration mechanism that performs prediction on rotated samples and averages the prediction of the original and rotated samples. During both training and testing, rotation is applied to facilitate accurate pose recovery under severe occlusions. ORPESO achieves consistent numerical improvements over existing transformer-based methods, achieving average improvements of 15.01 mm (20.6%) in MPJPE on the Human3.6M dataset. On MPI-INF-3DHP, it delivers further gains with PCK improved by 24.2%, AUC by 40.2%, and MPJPE reduced by 57.3% and 27.6% compared to existing state-of-the-art methods, respectively. Extensive evaluations on standard 3DHPE benchmarks, including Human3.6M and MPI-INF-3DHP, demonstrate that ORPESO demonstrates consistent numerical improvements over state-of-the-art baselines.
Hossain et al. (Thu,) studied this question.