What question did this study set out to answer?

The aim is to develop a framework that ensures consistent 3D generation from a single RGB image while addressing geometric and semantic challenges.

April 16, 2026

UniCross3D: Unified Cross‐View and Cross‐Domain Diffusion for Consistent Single‐Image 3D Generation

Key Points

The aim is to develop a framework that ensures consistent 3D generation from a single RGB image while addressing geometric and semantic challenges.
Introduced a diffusion framework called UniCross3D for 3D generation.
Implemented cross-view latent regularization to enhance geometric consistency.
Applied a cross-domain mutual information objective to align synthesized color and normal maps.
Conducted extensive experiments comparing the new approach with state-of-the-art methods.
Achieved significantly improved view consistency compared to existing methods.
Enhanced semantic alignment in generated 3D models.
Produced higher-fidelity reconstructions under challenging textures and ambiguous viewpoints.

Abstract

Abstract Reconstructing detailed geometry and realistic appearance from a single RGB image is essential yet fundamentally challenging due to inherent ambiguities such as occlusion, lighting variations, and texture‐geometry entanglement. While recent diffusion‐based generative models have significantly improved novel view synthesis, existing approaches suffer from two critical limitations: lack of cross‐view geometric consistency and insufficient cross‐domain semantic alignment. To address these issues, we introduce U ni C ross 3D , a unified cross‐view and cross‐domain diffusion framework designed explicitly for consistent and physically coherent 3D generation. U ni C ross 3D features two novel contributions: (1) a cross‐view latent regularization that enforces cross‐view geometric consistency across synthesized viewpoints by penalizing latent variance, and (2) a cross‐domain mutual information objective grounded in the physics of image formation, explicitly aligning synthesized color and normal maps. Extensive experiments demonstrate that U ni C ross 3D achieves significantly improved view consistency and semantic alignment over state‐of‐the‐art methods and yields higher‐fidelity reconstructions, particularly under challenging textures and ambiguous viewpoints.

Bookmark

Cite This Study

Jun et al. (Tue,) studied this question.

synapsesocial.com/papers/69e07dc72f7e8953b7cbeb46 https://doi.org/https://doi.org/10.1111/cgf.70378

Bookmark