Portrait relighting shows great potential in photography, film, and AR by simulating diverse lighting effects. Existing state-of-the-art methods often rely on expensive paired OLAT or synthetic data, which limits scalability. Moreover, accurately modeling the interaction between physics-guided rendering, neural rendering, and real-world remains challenging. To address these issues, we propose a novel multi-stage self-supervised relighting framework. It progressively refines intrinsic scene properties via a simple-to-complex training strategy, removing the need for expensive paired data while adapting to various lighting conditions. One core design introduces a novel pre-training method approach using diverse shading-based masking for self-reconstruction, which improves the model's perception of complex lighting variations. Furthermore, we introduce two perceptual modules that leverage the linear superposition of light to narrow the gap between physics-guided and neural rendering, and better align relit results with real-world observations. Extensive experiments demonstrate that our unified framework achieves new state-of-the-art performance in portrait relighting, surpassing recent methods in photorealism, synthesis quality, and identity preservation. It provides a practical paradigm for high-fidelity relighting under diverse lighting.
Guo et al. (Thu,) studied this question.