What question did this study set out to answer?

This research aims to improve semantic segmentation of UAV imagery by addressing key limitations of existing methods.

April 22, 2026Open Access

Heterogeneous attention fusion network with multi-branch decoding for UAV semantic segmentation

Puntos clave

This research aims to improve semantic segmentation of UAV imagery by addressing key limitations of existing methods.
Introduced HAFNet, a multi-branch decoder featuring diverse attention types.
Employed a squeeze-and-excitation module for channel-level adaptive feature fusion.
Conducted extensive experiments across four public UAV benchmarks.
Achieved state-of-the-art mIoU scores of 72.1% on UAVid and 84.5% on ISPRS Vaihingen.
Outperformed competing methods like UrbanSSF-L and UNetFormer.
Showed significant improvements on small-scale objects, with an +18.3% F1 score for cars.

Resumen

Semantic segmentation of Unmanned Aerial Vehicle (UAV) imagery remains a formidable task owing to extreme scale variation, cluttered backgrounds, and oblique viewing geometry. Existing approaches suffer from three interrelated limitations: (i) single-attention architectures employ only one attention paradigm throughout the decoder, inherently restricting the network to either global context or fine-grained local detail but not both; (ii) homogeneous multi-branch designs replicate the same attention type across branches, which increases computation without introducing representational diversity; and (iii) prevailing feature fusion strategies—element-wise addition and concatenation—lack channel-level adaptivity, failing to exploit the complementary strengths of heterogeneous feature sources. To overcome this bottleneck, we propose HAFNet (Heterogeneous Attention Fusion Network), which introduces a multi-branch decoder in which four parallel branches—employing multi-head, spatial, self-, and shifted-window attention, respectively—decode shared encoder features concurrently. A squeeze-and-excitation (SE) enhanced aggregation module then adaptively recalibrates and fuses the branch outputs at the channel level, enabling the network to leverage the complementary strengths of diverse attention mechanisms within a single forward pass. Extensive experiments on four public benchmarks demonstrate that HAFNet establishes new state-of-the-art results, achieving 72.1% mIoU on UAVid, 84.5% on ISPRS Vaihingen, 88.2% on ISPRS Potsdam, and 54.8% on LoveDA, surpassing the latest competing methods including UrbanSSF-L and UNetFormer. Ablation studies further verify that each branch provides unique and complementary representations; the full four-branch configuration consistently outperforms every subset, yielding especially pronounced improvements on small-scale objects (+18.3% F1 for cars) and heterogeneous land-cover categories.

Me gusta

Guardar

Ver artículo completo