ABSTRACT With the rapid development of deep learning, research on semantic segmentation of remote sensing images has made significant progress. However, there are common problems in remote sensing images, such as large‐scale differences between different types of objects and unbalanced sample numbers, which leads to poor semantic segmentation results, especially for small targets and rare types of objects. To address these challenges, a remote sensing image semantic segmentation method based on multi‐scale contextual information analysis named MSTNet is innovatively proposed. Its core design includes the semantic information enhancement module (SIE) of feature adaptive clustering, which strengthens the feature expression of different categories through adaptive clustering to alleviate sample imbalance; the weighted feature fusion module (WFF) adaptively aggregates cross‐level features, cooperates with the multi‐scale context enhancement module (MSCE), combines convolution and Transformer operations, and deeply mines local and global contexts to cope with scale changes; in addition, the network also contains a pixel space feature optimisation module (SFEM) to enhance spatial details. Experiments on the UAVid, LoveDA, Potsdam and Vaihingen datasets show that MSTNet significantly improves the ability to handle scale changes and imbalance problems and reaches advanced levels in key indicators such as OA, mIoU, and mF1, proving that MSTNet achieves competitive or even better performance.
Wang et al. (Thu,) studied this question.