What question did this study set out to answer?

This study aims to enhance high-resolution land cover semantic segmentation by addressing class imbalance and fine-scale textures.

May 8, 2026Open Access

Enhancing land cover semantic segmentation with convolutional block attention modules and deep supervision

Key Points

This study aims to enhance high-resolution land cover semantic segmentation by addressing class imbalance and fine-scale textures.
Introduced MUSCLE-Net, a Multi-Scale Land Cover Network with deep supervision in the decoder.
Implemented an auxiliary low-resolution segmentation task for semantic consistency across spatial resolutions.
Incorporated Convolutional Block Attention Modules to refine feature selection in the decoder.
Achieved an overall accuracy of 66.48% on the DynamicEarthNet dataset, outperforming UNet, DeepLabV3, and PSPNet by varying margins.
For the DFC2020 dataset, reached an overall accuracy of 70.10%, improving accuracy over competing frameworks.
Showed lower variability between runs, indicating enhanced robustness in minority land-cover classes.

Abstract

High-resolution land cover semantic segmentation is challenged by strong class imbalance, spatial fragmentation of minority classes, and the presence of fine-scale textures and sensor noise that can dominate early feature learning. In addition, producing high-resolution labeled maps is time-consuming and requires expert annotation, while low-resolution maps are easier to obtain but lack spatial precision. To address these challenges, we propose MUSCLE-Net, a Multi-Scale Land Cover Network that explicitly enforces semantic consistency across spatial resolutions through deep supervision. By introducing an auxiliary low-resolution segmentation task during early decoding, the network is constrained to learn semantically meaningful regional representations before recovering fine spatial details, promoting a coarse-to-fine decoding process that mitigates overfitting to high-frequency noise. Convolutional Block Attention Modules are incorporated in the decoder to further refine spatial and channel-wise feature selection. For the DynamicEarthNet dataset, MUSCLE-Net achieves an overall accuracy of 66.48%, outperforming UNet by 1.05%, DeepLabV3 by 6.21%, and PSPNet by 7.70%. For the DFC2020 dataset, MUSCLE-Net reaches an overall accuracy of 70.10%, improving upon UNet by 2.86%, PSPNet by 4.98%, and DeepLabV3 by 6.10%, and consistently shows lower variability between runs, reflecting enhanced robustness in minority land-cover classes. • Introduces MUSCLE-Net with multi-resolution deep supervision in the decoder. • Enforces semantic consistency across scales to improve minority class learning. • Proposes an attention-enhanced decoder using CBAM without residual addition. • Re-injects auxiliary supervision features to guide early decoder representations. • Demonstrates consistent accuracy and stability gains on two benchmark datasets.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Mobsite et al. (Fri,) studied this question.

synapsesocial.com/papers/69fd7fb8bfa21ec5bbf08424 https://doi.org/https://doi.org/10.1016/j.aiig.2026.100222

Bookmark

View Full Paper