Key points are not available for this paper at this time.
Remote sensing images often contain many similar components, such as buildings, roads, and water surfaces, which have similar spectra and spatial structures. Although convolutional neural networks (CNNs) based on residual learning can provide excellent performance in pansharpening, the existing methods do not make full use of intrinsically similar information in images. Moreover, since the convolution operation is focused on the local region, even in a deep network, position-independent global information is difficult to obtain. In this article, an efficient nonlocal attention residual network (NLRNet) is proposed to capture the similar contextual dependencies of all pixels. Specifically, to reduce the difficulty of network training caused by the original nonlocal attention, we propose an efficient nonlocal attention (ENLA) mechanism and employ residual with zero initialization (ReZero) technology to make the signal easy to spread through the network. Furthermore, a spectral aggregation module (SpecAM) is proposed to generate fused images and adjust the corresponding spectral information. The experimental results for the QuickBird and WorldView3 data sets show that the proposed method is competitive with other advanced methods based on quality assessment and visual perception.
Lei et al. (Wed,) studied this question.