While recent advancements have demonstrated remarkable progress in general 3D shape generation, the challenge of automatically generating wearable 3D assets remains largely unexplored. To address this gap, we present BAG - a Body aligned Asset Generation method that produces 3D wearable assets which can be automatically fitted onto given 3D human bodies. This is achieved by controlling the 3D generation process using human body shape and pose information. Specifically, we first construct a general single-image-to-consistent-multi-view diffusion model, and train it on the large-scale Objaverse dataset to ensure diversity and generalizability. We then train a body conditioned multi-view ControlNet to guide the generator toward producing body-aligned multi-view images. The control signal leverages multi-view 2D projections of the target human body, where pixel values represent the XYZ coordinates of the body surface in a canonical space. The resulting body-conditioned multi-view diffusion outputs body-aligned images, which are subsequently fed into a native 3D diffusion model to reconstruct the 3D shape of the asset. Finally, we recover the similarity trans formation using multi-view silhouette supervision and mitigate asset-body penetration using physics-based simulation, ensuring accurate asset fitting onto the target body. Experimental results demonstrate that our method significantly outperforms existing approaches in terms of prompt adherence, shape diversity, and shape quality.
Luo et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: