Key points are not available for this paper at this time.
We introduce a method to generate 3D scenes that are disentangled into their component objects. This disentanglement is unsupervised, relying only on the knowledge of a large pretrained text-to-image model. Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene. Concretely, our method jointly optimizes multiple NeRFs from scratch - each representing its own object - along with a set of layouts that composite these objects into scenes. We then encourage these composited scenes to be in-distribution according to the image generator. We show that despite its simplicity, our approach successfully generates 3D scenes decomposed into individual objects, enabling new capabilities in text-to-3D content creation. For results and an interactive demo, see our project page at https://dave.ml/layoutlearning/
Building similarity graph...
Analyzing shared references across papers
Loading...
Epstein et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68e779ebb6db6435876ee9c9 — DOI: https://doi.org/10.48550/arxiv.2402.16936
Dave Epstein
Ben Poole
Ben Mildenhall
Building similarity graph...
Analyzing shared references across papers
Loading...