Abstract The advancement of generative Artificial Intelligence (AI), particularly with the advent of diffusion models and 3D Gaussian Splatting (3DGS), has introduced novel avenues for manipulating and synthesizing 3D models. However, current 3D editing methods primarily focus on global style transfers or constrained geometric deformations. They face significant challenges in executing fine-grained, part-level manipulations guided by text prompts, especially for complex tasks that require simultaneous changes to both geometry and appearance. Many existing approaches operate at the rendering level, which hinders the creation of new geometric structures. To overcome these limitations, we propose Mask2-3D, a diffusion-based framework for prompt-driven, part-level 3D editing. The core of our framework is a learnable, multi-view mask generator that predicts a coherent editing region rather than just segmenting existing contours. This unique mechanism provides the flexibility to create new shape architectures and undergo significant geometric modifications. Furthermore, the system integrates a LoRA-finetuned diffusion model to facilitate high-fidelity content synthesis and style transfer within these designated regions, while a subsequent re-rendering process ensures multi-view consistency. By implementing this innovative workflow, Mask2-3D enables precise, flexible, and structurally sound local editing of 3D models via natural language commands, significantly enhancing the intuitiveness and creative freedom of the 3D content creation process.
Zhu et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: