What type of study is this?

This is a Quantitative Study study (also classified as: Experimental Study).

October 2, 2025Open Access

Splatting the Cat: Efficient Free-Viewpoint 3D Virtual Try-On via View-Decomposed LoRA and Gaussian Splatting

Key Points

Our framework achieves superior geometric consistency in 3D virtual try-on results.
Using Gaussian Splatting, we reconstruct a stable 3D scene with significant memory efficiency.
The CatVTON model generates diverse 2D images, guiding fine-tuning for improved garment details.
Iterative optimization allows for a free-viewpoint 3D VTON experience on standard consumer hardware.

Abstract

As Virtual Try-On (VTON) technology matures, 2D VTON methods based on diffusion models can now rapidly generate diverse and high-quality try-on results. However, with rising user demands for realism and immersion, many applications are shifting towards 3D VTON, which offers superior geometric and spatial consistency. Existing 3D VTON approaches commonly face challenges such as barriers to practical deployment, substantial memory requirements, and cross-view inconsistencies. To address these issues, we propose an efficient 3D VTON framework with robust multi-view consistency, whose core design is to decouple the monolithic 3D editing task into a four-stage cascade as follows: (1) We first reconstruct an initial 3D scene using 3D Gaussian Splatting, integrating the SMPL-X model at this stage as a strong geometric prior. By computing a normal-map loss and a geometric consistency loss, we ensure the structural stability of the initial human model across different views. (2) We employ the lightweight CatVTON to generate 2D try-on images, that provide visual guidance for the subsequent personalized fine-tuning tasks. (3) To accurately represent garment details from all angles, we partition the 2D dataset into three subsets—front, side, and back—and train a dedicated LoRA module for each subset on a pre-trained diffusion model. This strategy effectively mitigates the issue of blurred details that can occur when a single model attempts to learn global features. (4) An iterative optimization process then uses the generated 2D VTON images and specialized LoRA modules to edit the 3DGS scene, achieving 360-degree free-viewpoint VTON results. All our experiments were conducted on a single consumer-grade GPU with 24 GB of memory, a significant reduction from the 32 GB or more typically required by previous studies under similar data and parameter settings. Our method balances quality and memory requirement, significantly lowering the adoption barrier for 3D VTON technology.

Read Full Paperexternally

AIに質問

Bookmark

View Full Paper