In the realm of virtual avatar creation, accurate relighting capabilities are key to enhancing realism and immersion. We propose a novel pipeline for building personalized and relightable avatars from a monocular video captured under unknown lighting. This minimal input poses challenges in material entanglement and novel-view inconsistency. To tackle these, we introduce a disentangled dynamic 3D Gaussian representation that models diverse material properties and supports photorealistic rendering and animation via a parametric face model. To resolve material ambiguity under uncontrolled lighting, we train a 2D diffusion-based model to predict canonical-lighting images and physically-based material maps from casually lit portraits. These predictions serve as supervisory signals to guide the 3D disentanglement process. Additionally, we incorporate a 3D prior to enhance novel-view consistency, improving geometry and appearance in unseen views. Experiments demonstrate that our approach significantly boosts reconstruction quality and relighting fidelity, offering a practical and cost-effective solution for creating high-quality personalized avatars.
Chen et al. (Thu,) studied this question.