Face unmasking is a critical task in image restoration, as masks conceal essential facial features like the mouth, nose, and chin. Current inpainting methods often struggle with structural fidelity when handling large-area occlusions, leading to blurred or inconsistent results. To address this gap, we propose the Masked-to-Unmasked Network (M2UNet), a segmentation-guided generative framework. M2UNet leverages a segmentation-derived mask prior to accurately localize occluded regions and employs a multi-scale, attention-enhanced generator to restore fine-grained facial textures. The framework focuses on producing visually and semantically plausible reconstructions that preserve the structural logic of the face. Evaluated on a synthetic masked-face dataset derived from CelebA, M2UNet achieves state-of-the-art performance with a PSNR of 31.3375 dB and an SSIM of 0.9576. These results significantly outperform recent inpainting methods while maintaining high computational efficiency.
Mahmoud et al. (Thu,) studied this question.