Key points are not available for this paper at this time.
While recent advancements in image inpainting techniques have shown significant improvements, it often faces challenges in producing realistic image structures, particularly when it comes to filling large holes within intricate images. Moreover, considering computational efficiency, it is common practice to train the network using low-resolution images. To address the restoration challenge of high-resolution images with significant missing regions, a new method called Multi-Conv-Transformer is proposed in this paper. We integrate the advantages of Transformer and CNNs to enable the model to exhibit efficient performance in training high-resolution images. A customized transformer block specifically optimized for inpainting purposes is introduced in our work. Within this block, the proposed multi-head self-attention module collects non-local information only from valid tokens identified by a dynamic mask., thus prioritizing the partial regions of the image. The experiment results demonstrate that our model performs well in the restoration of various scenarios.
Zhou et al. (Fri,) studied this question.