Abstract Recent advancements in text‐to‐3D generation techniques have successfully produced high‐fidelity 3D content. However, existing approaches lack fine‐grained control over the results, making it challenging to produce controllable 3D content. Hand‐drawn sketches are brief and intuitive, enabling interactive 3D control. However, their ambiguity complicates text‐to‐3D pipelines. We introduce CanvasDream, a sketch‐guided text‐to‐3D generation approach capable of producing high‐fidelity 3D assets with realistic textures and rich details that faithfully preserve both textual descriptions and sketch compositions. We design a novel two‐stage 3D generation framework that disentangles geometry and appearance learning. For geometry, we leverage 3D Gaussian Splatting to generate object shape under the guidance of our proposed sketch‐guided multi‐view diffusion model. For appearance, we propose Physically‐Based Rendering (PBR), enabling the resulting assets to be directly used in downstream applications. Extensive experiments demonstrate the effectiveness and superior performance of our method in 3D generation compared to state‐of‐the‐art approaches. Moreover, user studies further highlight the high controllability of our approach in 3D generation, affirming its practical value.
Zheng et al. (Thu,) studied this question.