Image inpainting aims to recover missing regions in damaged images while preserving structural coherence and textural authenticity. Although deep learning methods based on generative adversarial networks (GAN) have made significant progress, they still face challenges in modeling long-range dependencies and maintaining semantic consistency, especially when large areas are missing. To address these issues, we propose an innovative multi-stage restoration framework. The coarse restoration stage incorporates attention via a transformer architecture, while the refinement stage introduces a plug-and-play channel-frequency encoder (CF-Encoder). This encoder effectively models both global structure and local details by hierarchically extracting and enhancing features through frequency-domain decomposition combined with an adaptive spatial-channel attention mechanism. Furthermore, we employ a bi-discriminator fusion mechanism to stabilize training and enhance perceptual quality. Experiments across multiple benchmark datasets demonstrate our method’s superior performance in both quantitative metrics and visual fidelity, with particularly notable advantages in high-missing-value scenarios.
Wang et al. (Fri,) studied this question.