Change detection using bi-temporal satellite imagery is an important problem in earth monitoring that aims at identifying and localizing surface changes between a temporal acquisition and distinguishing actual pseudo-changes due to seasonal cycles, atmospheric factors, or man-made developments. Current deep learning methods typically face a one-way bias in their temporal modeling, which restricts the usability of large-scale spatial relationships to empower the proper characterization of change patterns. Although recent transformer-based techniques are highly accurate, their computational requirements are also quite large, thus limiting their use on resource-constrained tasks. In response to these limitations, an Interactive Multiscale Attention Fusion (IMAF) network with Sparse Feature Convolution (SFC) based on Ghost modules to attain effective multi-scale features extraction is proposed in this paper. The network utilizes advanced encoder-decoder architecture supplemented with interactive attention modules in different scales. The network works with complementary forward and backward attention streams: the former one records progressive temporal variation and the latter one verifies a consistency of the changes by reverse temporal analysis. This interactive method allows effective representations of the duplex model of temporal relationships without being computationally prohibitive, enhancing far better approaches to distinguish legitimate land cover moves, versus noise absent noise-induced variation. Two benchmark datasets are experimented with: the Onera Satellite Change Detection dataset (OSCD), and the SZTAKI AirChange datasets. The significant experiments made on OSCD datasets reveal that proposed approach attain competitive F1 rates of 58.14% (13 channels) and 50.68% (3 channels) with just 2.13M parameters and 19.6G FLOPS, 19 times few parameters, and 5.6x inference turnover as compared to ChangeFormer. Competitive performance on Szada/1 and Tiszadob/3 test set samples of the SZTAKI dataset, F1-score of 74.29% and 92.86% respectively makes it possible to implement in the most edge devices, real-time analytics, and extensive mapping systems where operational energy directly depends on computational energy.
Gupta et al. (Tue,) studied this question.