Key points are not available for this paper at this time.
This paper presents a neural architecture MVDiffusion++ for 3D object reconstruction that synthesizes dense and high-resolution views of an object given one or a few images without camera poses. MVDiffusion++ achieves superior flexibility and scalability with two surprisingly simple ideas: 1) A ``pose-free architecture'' where standard self-attention among 2D latent features learns 3D consistency across an arbitrary number of conditional and generation views without explicitly using camera pose information; and 2) A ``view dropout strategy'' that discards a substantial number of output views during training, which reduces the training-time memory footprint and enables dense and high-resolution view synthesis at test time. We use the Objaverse for training and the Google Scanned Objects for evaluation with standard novel view synthesis and 3D reconstruction metrics, where MVDiffusion++ significantly outperforms the current state of the arts. We also demonstrate a text-to-3D application example by combining MVDiffusion++ with a text-to-image generative model.
Building similarity graph...
Analyzing shared references across papers
Loading...
Shitao Tang
Jiacheng Chen
Dilin Wang
Building similarity graph...
Analyzing shared references across papers
Loading...
Tang et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68e78a66b6db6435876fcf31 — DOI: https://doi.org/10.48550/arxiv.2402.12712
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: