Blind Face Restoration (BFR) aims to reconstruct high-quality face images from low-quality inputs without any prior knowledge of the specific degradation types or levels. In recent years, remarkable progress has been achieved, particularly through GAN- and diffusion-based approaches, which have greatly improved perceptual realism and reconstruction fidelity. However, existing approaches typically rely solely on visual cues from degraded images. This often results in inaccurate reconstruction of facial details and noticeable identity distortion, particularly under severe or complex degradations. To address these limitations, we incorporate auxiliary textual information into BFR to enable the recovery of subtle facial attributes, such as wrinkles, moles, and skin marks that are often overlooked or hard to reconstruct by conventional visual priors. To support this idea, we first construct a large-scale dataset containing 30,000 detailed textual descriptions paired with CelebA-HQ face images, explicitly designed to capture fine-grained facial semantics. To effectively bridge the gap between visual data and natural language, we further propose FaceCLIP, a fine-tuned vision-language model specifically tailored to the human face. FaceCLIP enables more accurate alignment between face images and their corresponding textual descriptions by effectively capturing nuanced semantic cues critical for faithful face reconstruction. Built upon these foundations, we propose Text-guided Blind Face Restoration (TBFR), a novel diffusion-based framework that explicitly integrates textual guidance into the face restoration pipeline. Within TBFR, a text-guided hybrid attention block is designed to effectively fuse visual and textual features, while a text-aware loss is employed to enforce semantic consistency between the generated images and their associated textual descriptions. Extensive experimental results show that TBFR outperforms state-of-the-art BFR methods in terms of both quantitative metrics and subjective perceptual quality, establishing a new benchmark for BFR tasks.
An et al. (Thu,) studied this question.