This paper introduces a lightweight, multi-scale, dual attention deep learning model (MS-DANet) aiming to tackle the challenges of high inter-class similarity, significant multi-scale differences, and constraints in model deployment within the context of high-resolution remote sensing image land cover classification. This model is based on the encoder decoder structure, introducing MobileNetV2 to extract multi-level features at the encoding end, and designing parallel atrous space pyramid pooling (ASPP) and feature pyramid network (FPN) structures at the decoding end to achieve effective fusion of global semantics and local detail information; Furthermore, the channel-space dual attention module is embedded to adaptively enhance the characteristics of key features and alleviate the problem of category confusion. In order to achieve both high accuracy and high efficiency, the model introduced the mechanism of deep separable convolution (DSC) and knowledge distillation, and compressed the parameters to 4.8M. The experimental results on ISPRS Vaihingen data set show that the overall accuracy, average F1 and mIoU index of MS-DANet are 90.6%, 91.2% and 83.9% respectively, which is superior to mainstream models such as U-Net and DeepPlabv 3+, and the parameter quantity is only 8.8% of the latter, which verifies its superiority and practicability in the task of automatic interpretation of high-resolution remote sensing images.
Liu et al. (Sun,) studied this question.