Remote sensing change detection (CD) is a technique for quantitatively analyzing and determining the characteristics and processes of surface change using bi-temporal remote sensing data. Deep convolutional networks have achieved remarkable success in CD tasks. However, due to the complexity of the natural lighting environment and other factors, how to use bi-temporal images and segment objects more accurately and effectively has become a focus of research. Many existing studies have overlooked the relationship between samples, disregarding the potential connection between the same semantics across the entire sample set. Moreover, they have ignored the semantic connection between bi-temporal images and have resorted to simple techniques such as concatenation or absolute value subtraction to achieve bi-temporal feature fusion, resulting in information loss. We propose a cross-image feature interaction network consisting of three modules to address the above issues: cross-image non-local enhancement (CINE) module, which can enhance the spatial dimensional links between the same type of objects in the sample space and explores the potential relationship between the same semantics samples on the whole sample set; cross-temporal feature enhancement (CTFE) module, which interacts with bi-temporal image features to enhance real change features while suppressing irrelevant change features; and difference feature adaptive fusion (DFAF) module, which can make effective use of the bi-temporal image features extracted by the network and adaptively learns the fusion parameters. We conducted extensive experiments on two CD datasets, LEVIR-CD and DSIFN-CD, and obtained evaluation scores of 90.75%/83.07% and 69.94%/53.78% on the F1-score and IoU metrics, respectively. Our strategy surpasses existing attention-based approaches such as BIT.
Han et al. (Fri,) studied this question.