What question did this study set out to answer?

This research aims to develop an efficient framework for high-quality 3D shape generation using next-scale prediction techniques.

June 13, 2026

Grow3D: Hierarchical Next-Scale Octree Prediction for Fast and High-Fidelity 3D Shape Generation

Key Points

This research aims to develop an efficient framework for high-quality 3D shape generation using next-scale prediction techniques.
Implemented a Vector Quantized Variational Autoencoder (VQ-VAE) for encoding 3D shapes into a multi-scale latent representation.
Utilized a next-scale prediction strategy for auto-regressive generation of octree structure and geometry features.
Introduced an octree-structure-aware attention mechanism and CFG-based sampling strategy to enhance generation quality.
Grow3D significantly improves the quality of 3D shape generation compared to existing methods.
Achieves faster inference times, enabling real-time applications.
Demonstrates enhanced diversity and quality in generated 3D shapes through new sampling strategies.

Abstract

Diffusion models and auto-regressive models have been widely adopted in recent research on 3D shape generation. However, diffusion models and standard next-token prediction auto-regressive models require dozens to even hundreds of generation steps, leading to slow inference. A new paradigm of auto-regressive with next-scale prediction has demonstrated both inference efficiency and generation quality. However, existing works are restricted to fixed-length representations and cannot be directly applied to more adaptive variable-length representations, such as sparse voxels. To address these limitations, we propose Grow3D , a new auto-regressive generative framework that generates high-quality 3D shapes in a coarse-to-fine manner via ”next-scale prediction”, achieving both high quality and fast inference. Specifically, we first employ a Vector Quantized Variational Autoencoder (VQ-VAE) with residual serialization to encode 3D shapes into a multiscale, sparse-structured latent representation with quantized features. Building on this latent representation, we utilize a next-scale prediction strategy to auto-regressively generate both the octree structure and the corresponding geometry features. Benefiting from the inherent structure of the octree, we introduce an octree-structure-aware attention mechanism that selectively attends to the most relevant features. Furthermore, a CFG-based sampling strategy is proposed to enhance the quality and diversity of generation. Extensive experiments demonstrate that Grow3D outperforms state-of-the-art methods in both 3D shape generation quality and speed, enabling some real-time downstream applications, such as interactive 3D editing.

Bookmark

Grow3D: Hierarchical Next-Scale Octree Prediction for Fast and High-Fidelity 3D Shape Generation

Key Points

Abstract

Cite This Study

Also Consider

Also Consider