Virtual try-on (VTON) aims to synthesize specific fashion images dressed in given garments, which possesses great potential in real-world scenarios. Existing methods generally stand on the shoulder of the single-view VTON to train a warping model and then fit the given garments onto the human body under a fixed posture and viewpoint, which often fails to preserve the consistent garment characteristics in across-view and multi-pose guided try-on scenarios due to the lack of both across-view data and effective view consistency training. To alleviate this dilemma, we propose a fresh view consistency-driven VTON task (VC-VTON) and release a multi-view virtual try-on dataset with complete annotation (e.g., viewpoint, text, posture, parsing maps, etc.) to encourage across-view training scenarios. Based on this hard-won dataset, we further propose VC-TwinNet, a Twin-UNet baseline based on spatiotemporal-aware View Consistency training, designed specifically for the challenging task. Specifically, to enable view-aware denoising and sparse-to-continuous view generalization, we introduce RoPE and circle embedding to represent the relative and continuous position relation across viewpoints, serving to distinguish their outfitting appearance and warping states. Afterwards, to implicitly learn the interactions across views under given multiple posture conditions, we further contribute a spatiotemporal-aware view attention module to capture the spatial and temporal details for across-view training. Moreover, we utilize an across-view consistency loss to supervise the model training, to ultimately improve the performance of our VC-VTON. Extensive experiments demonstrate the superiority of our approach and state-of-the-art results on various evaluations without declining single-view performance.1And as for practicality and timeliness, our proposed components are essentially plug-and-play and remain effective in the new DiT-centered paradigm.
Zx et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: