Diffusion models and auto-regressive models have been widely adopted in recent research on 3D shape generation. However, diffusion models and standard next-token prediction auto-regressive models require dozens to even hundreds of generation steps, leading to slow inference. A new paradigm of auto-regressive with next-scale prediction has demonstrated both inference efficiency and generation quality. However, existing works are restricted to fixed-length representations and cannot be directly applied to more adaptive variable-length representations, such as sparse voxels. To address these limitations, we propose Grow3D , a new auto-regressive generative framework that generates high-quality 3D shapes in a coarse-to-fine manner via ”next-scale prediction”, achieving both high quality and fast inference. Specifically, we first employ a Vector Quantized Variational Autoencoder (VQ-VAE) with residual serialization to encode 3D shapes into a multiscale, sparse-structured latent representation with quantized features. Building on this latent representation, we utilize a next-scale prediction strategy to auto-regressively generate both the octree structure and the corresponding geometry features. Benefiting from the inherent structure of the octree, we introduce an octree-structure-aware attention mechanism that selectively attends to the most relevant features. Furthermore, a CFG-based sampling strategy is proposed to enhance the quality and diversity of generation. Extensive experiments demonstrate that Grow3D outperforms state-of-the-art methods in both 3D shape generation quality and speed, enabling some real-time downstream applications, such as interactive 3D editing.
Feng et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: