This paper presents SEEAvatar, a novel approach to generate photorealistic 3D avatars from text descriptions. Despite the fact that recent text-to-3D avatar generation methods have shown promising results, their joint representation and optimization of geometry and appearance often yield coarse results and limits practical applications. Our method introduces novel constraints for decoupled geometry and appearance. First, we constrain geometric optimization using a template avatar, which evolves periodically to enable flexible shape generation while maintaining decent human shape. The detailed geometry features in faces and hands are also preserved from static human priors. Second, we leverage diffusion models to guide a physically based rendering pipeline for texture generation, incorporating a lightness constraint on albedo textures to suppress incorrect lighting effects. Experimental results demonstrate that our method significantly outperforms existing methods in both global and local geometry quality as well as appearance fidelity. The high-quality meshes and textures produced by our approach are directly compatible with traditional graphics pipelines, enabling immediate practical applications. Project page at: https://yoxu515.github.io/SEEAvatar/.
Xu et al. (Wed,) studied this question.