Key points are not available for this paper at this time.
Semantic segmentation of high-resolution multispectral remote sensing image has been intensely studied. However, the shadow occlusions, or the similar color and textures, between the categories influence the segmentation accuracy. Concomitantly, the size of targets in the remote sensing images is diverse and the network cannot balance their segmentation. This paper introduces a network, Transformer-based Multi-modal Fusion Network (TMFNet), which fuses the multi-modal features and incorporates height features from the digital surface model (DSM) to supplement the extra different features between each category. Particularly, we introduce two parallel encoders to extract the features from different modalities, a Multi-Modal fusion model based on the Transformer (MMformer) to complete the multi-modal fusion, and a Border Region Attention based multi-level Fusion Module (BRAFM) to integrate the cross-level features and enhance the small target segmentation by utilizing the details around the border. The experiment results on the ISPRS Vaihingen and Potsdam benchmark datasets indicate that the proposed TMFNet outperforms the SOTA methods on the segmentation performance.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yutong Liu
Kun Gao
Hong Wang
International Journal of Applied Earth Observation and Geoinformation
Beijing Institute of Technology
Wuhan University of Science and Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Liu et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68e5c971b6db64358755fb4a — DOI: https://doi.org/10.1016/j.jag.2024.104083