What question did this study set out to answer?

The aim is to enhance semantic segmentation accuracy in remote sensing imagery using a hierarchical multimodal fusion model.

April 30, 2026Open Access

A multi-level feature fusion semantic segmentation model for remote sensing image

Key Points

The aim is to enhance semantic segmentation accuracy in remote sensing imagery using a hierarchical multimodal fusion model.
Developed the TransDeepUNet model combining RGB imagery and DSM data.
Utilized a parameter-sharing dual-branch encoder and cross-modal attention modules.
Evaluated performance on ISPRS Vaihingen, Potsdam datasets, and a Swiss dataset.
Achieved an mIoU of 85.64% and mF1-score of 92.07% on the Potsdam dataset.
Outperformed existing CNN and hybrid models in segmentation accuracy.
Maintained competitive computational complexity alongside strong performance.

Abstract

Accurate semantic segmentation of remote sensing imagery requires both fine-grained boundary modelling and long-range contextual reasoning. To address this challenge, we propose TransDeepUNet, a hierarchical multimodal fusion network that integrates RGB imagery and DSM elevation data. The framework employs a parameter-sharing dual-branch encoder to preserve modality-specific representations. A shallow cross-modal attention module enhances structural details, while a deep cross-modal Transformer models global dependencies and semantic alignment. A cascaded decoder progressively reconstructs high-resolution segmentation maps. Experiments on the ISPRS Vaihingen and Potsdam datasets and a high-resolution Swiss dataset demonstrate consistent performance improvements over strong CNN and hybrid baselines. On the Potsdam dataset, TransDeepUNet achieves an mIoU of 85.64% and an mF1-score of 92.07%, outperforming comparable multimodal models while maintaining competitive computational complexity. The code is publicly available at: https://github.com/yingning01/TransDeepUNet.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Wang et al. (Sun,) studied this question.

synapsesocial.com/papers/69f2f0991e5f7920c6386b5f https://doi.org/https://doi.org/10.1080/27669645.2026.2663213

Bookmark

View Full Paper