Pose-guided human image generation aims to render a source image in a specific pose. Current methods predominantly employ 2D-based signals, which exhibit inherent information deficits, as pose conditions. This leads to difficulty in establishing precise source-target appearance-pose correspondence and further causing uncertainty in predicting self-occluded regions’ appearance. To address these issues, we propose a 3D Pose Conditional Diffusion model (3DPCD) that leverages a human parametric model to integrate comprehensive and adjustable 3D control into forward-backward diffusion steps. Specifically, we employ Fourier-transformed SMPL-X as the 3D pose representation to facilitate precise source-target correspondence by understanding the complete pose information. Building on this, we further propose a hierarchical appearance-pose alignment method, which aligns appearance with the complete pose information at both global and local levels. Moreover, motivated by the fact that human pose transformation is a progressive process in 3D space and our 3D pose representation is adjustable, we integrate progressively interpolated 3D control into a series of sampling steps. This effectively mitigates uncertainties in pixel transfer between poses. It should be noted that the proposed explicit pose-guided strategy also supports flexible adjustment of pose, shape, and viewpoint. Both quantitative and qualitative evaluations demonstrate that our 3DPCD outperforms state-of-the-art methods on the widely used DeepFashion InShop benchmark and our newly constructed PoseWeb-33 dataset, which features richer appearance variations and more diverse conditional poses.
Dong et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: