What question did this study set out to answer?

The aim is to enhance semantic segmentation of remote sensing images by addressing scale changes and sample imbalance.

March 10, 2026Open Access

MSTNet: Multi‐Scale Contextual Analysis Network for Semantic Segmentation of Remote Sensing Images

Key Points

The aim is to enhance semantic segmentation of remote sensing images by addressing scale changes and sample imbalance.
Developed MSTNet for semantic segmentation, incorporating multi-scale contextual analysis.
Implemented a semantic information enhancement module for adaptive clustering to balance sample representation.
Utilized a weighted feature fusion module to aggregate features across levels.
Combined convolutional and transformer operations for better context mining.
Introduced a pixel space feature optimisation module to improve spatial detail.
MSTNet improved handling of scale variations and sample imbalance issues.
Achieved advanced levels in overall accuracy (OA), mean Intersection over Union (mIoU), and mean F1 score (mF1).
Demonstrated competitive performance across multiple datasets including UAVid, LoveDA, Potsdam, and Vaihingen.

Abstract

ABSTRACT With the rapid development of deep learning, research on semantic segmentation of remote sensing images has made significant progress. However, there are common problems in remote sensing images, such as large‐scale differences between different types of objects and unbalanced sample numbers, which leads to poor semantic segmentation results, especially for small targets and rare types of objects. To address these challenges, a remote sensing image semantic segmentation method based on multi‐scale contextual information analysis named MSTNet is innovatively proposed. Its core design includes the semantic information enhancement module (SIE) of feature adaptive clustering, which strengthens the feature expression of different categories through adaptive clustering to alleviate sample imbalance; the weighted feature fusion module (WFF) adaptively aggregates cross‐level features, cooperates with the multi‐scale context enhancement module (MSCE), combines convolution and Transformer operations, and deeply mines local and global contexts to cope with scale changes; in addition, the network also contains a pixel space feature optimisation module (SFEM) to enhance spatial details. Experiments on the UAVid, LoveDA, Potsdam and Vaihingen datasets show that MSTNet significantly improves the ability to handle scale changes and imbalance problems and reaches advanced levels in key indicators such as OA, mIoU, and mF1, proving that MSTNet achieves competitive or even better performance.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper