What question did this study set out to answer?

The aim is to improve semantic segmentation of remote sensing images by addressing scale variation and class-wise distribution challenges.

April 4, 2026Open Access

RTAS-Net: A ResNet-transformer-ASPP semantic segmentation network for remote sensing images

Key Points

The aim is to improve semantic segmentation of remote sensing images by addressing scale variation and class-wise distribution challenges.
Developed RTAS-Net using U-Net architecture for semantic segmentation.
Incorporated ASPP for multi-scale contextual information aggregation.
Used Swin Transformer for modeling cross-region dependencies.
Integrated mini-ASPP for neighborhood information reinforcement and MobileViT for fine-grained representation.
Consistently improved metrics including mIoU, mF1, and overall accuracy on various datasets.
Demonstrated enhanced recognition of small objects and boundary delineation.

Abstract

Semantic segmentation of remote sensing images faces pronounced scale variation and complex class-wise spatial distributions, which often lead to semantic discontinuity in large regions and the loss of fine details for small objects. To address these issues, this paper proposes a U-Net–based remote sensing semantic segmentation network termed RTAS-Net (ResNet–Transformer–ASPP Segmentation Network), which enhances feature representation through a collaborative design of multi-scale context aggregation, fine-scale reinforcement, and local-to-global modeling. At the high-semantic level, the network incorporates ASPP to aggregate multi-scale contextual information and enlarge the effective receptive field, while integrating the window-based self-attention mechanism of Swin Transformer to model cross-region dependencies, thereby improving semantic consistency over large-scale areas. At high-resolution skip connections, a lightweight mini-ASPP is embedded to reinforce and pre-fuse fine-scale neighborhood information, and MobileViT is introduced to strengthen local texture and fine-grained structural representations, thus enhancing the recognition and boundary delineation of small objects. Rather than a simple stacking of modules, RTAS-Net achieves unified modeling of global semantics and local details through coordinated cross-level pathways. Experimental results on the ISPRS Potsdam, Vaihingen and LoveDA datasets demonstrate that the proposed method achieves consistent improvements in mIoU, mF1, and OA, and further provides a comprehensive analysis of parameter scale and inference efficiency, validating its effectiveness and practical applicability.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Wang et al. (Thu,) studied this question.

synapsesocial.com/papers/69d0aff2659487ece0fa6150 https://doi.org/https://doi.org/10.1371/journal.pone.0343729

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper