What question did this study set out to answer?

The aim is to improve the accuracy of remote sensing semantic segmentation in challenging scenes while maintaining computational efficiency.

April 22, 2026Open Access

PFEB: A Post-Fusion Enhanced Decoder Module for Remote Sensing Semantic Segmentation

Key Points

The aim is to improve the accuracy of remote sensing semantic segmentation in challenging scenes while maintaining computational efficiency.
Developed the Post-fusion Enhanced Block (PFEB) for decoder-side refinement.
Incorporated techniques like channel expansion, depthwise and pointwise convolutions, and efficient channel attention.
Evaluated the method using SegFormer on remote sensing benchmarks LoveDA and ISPRS Vaihingen.
Achieved 53.82 ± 0.31 mean intersection over union (mIoU) on LoveDA and 74.84 ± 0.41 mIoU on ISPRS Vaihingen.
Improved semantic correctness near boundaries and enhanced recovery of small objects.
Increased accuracy with a modest computational cost of +0.53 M parameters and +8.7 G floating point operations.

Abstract

Remote sensing semantic segmentation is fundamental to applications such as land-cover mapping, urban analysis, and environmental monitoring. However, remote sensing scenes often exhibit pronounced scale variation, fragmented regions, dense small objects, and complex boundary transitions, making fine-grained prediction particularly challenging. Transformer-based architectures such as SegFormer have demonstrated a strong capability in modeling long-range context through hierarchical encoding, yet their lightweight decoders mainly rely on linear projection and feature fusion, providing limited capacity for local refinement after multi-scale aggregation. This limitation may reduce spatial precision in boundary-sensitive and small-object-rich regions. To address this issue, we propose the Post-fusion Enhanced Block (PFEB), a lightweight decoder-side refinement module inserted after multi-scale feature fusion and before pixel-wise classification. PFEB combines channel expansion, depthwise and pointwise convolutions, efficient channel attention (ECA), and residual learning to enhance local semantic refinement while largely preserving computational efficiency. Built upon SegFormer, the proposed method was evaluated on two widely used remote sensing benchmarks, i.e., LoveDA and ISPRS Vaihingen, under both Mix Transformer-B0 (MiT-B0) and Mix Transformer-B2 (MiT-B2) backbones. Experimental results show that PFEB consistently improves the SegFormer baseline across datasets and model scales. Under MiT-B2 backbone, our method achieves 53.82 ± 0.31 mean intersection over union (mIoU) on LoveDA and 74.84 ± 0.41 mIoU on ISPRS Vaihingen. Boundary- and size-aware evaluations further indicate that the gains are mainly reflected in improved semantic correctness near boundaries and in the recoverability of small objects. With only modest additional cost (approximately +0.53 M parameters and +8.7 G floating point operations (FLOPs)), PFEB provides a favorable accuracy–efficiency trade-off. These results suggest that PFEB is an effective and lightweight post-fusion refinement module for improving fine-grained remote sensing semantic segmentation.

PFEB: A Post-Fusion Enhanced Decoder Module for Remote Sensing Semantic Segmentation

Key Points

Abstract

Cite This Study