Recent years have witnessed significant advancements in multi-person pose estimation within the You Only Look Once (YOLO) framework. However, human body images are frequently blurred and anonymized to address privacy concerns, which significantly undermines the accuracy and reliability of pose estimation. To overcome these limitations, this article proposes an optimization program for YOLO-Pose, enabled by flexible structural configurations and custom training parameters to enhance adaptability. Specifically, a deconvolution-based upsampling module and a specialized blurred data augmentation strategy are introduced to improve the model’s robustness and generalization. Notably, the proposed model, even when trained exclusively on sharp images, demonstrates superior predictive performance on blurred inputs. Furthermore, we design a universal skeleton connection method that enables YOLO-Pose to seamlessly adapt to datasets with varying numbers of key points, significantly increasing its versatility across diverse annotation standards. Experimental results on the CrowdPose dataset demonstrate the superiority of the proposed method. While maintaining a parameter count nearly identical to that of the self-trained YOLO12n-Pose baseline, our model achieves relative improvements of +4.1%, +15.4%, and +6.8% in mAP@50:95 on test sets corrupted by Gaussian blur, motion blur, and defocus blur respectively, under the most severe degradation levels. The optimized model demonstrates robust and accurate pose estimation directly on blurred input images with varying intensities, highlighting its strong generalization capability under privacy-preserving visual conditions.
Yu et al. (Fri,) studied this question.