What question did this study set out to answer?

To enhance automated identification of Benggang landforms using multimodal data and a dual-stream feature-fusion network.

March 25, 2026Open Access

Remote Sensing Identification of Benggang Using a Two-Stream Network with Multimodal Feature Enhancement and Sparse Attention

Key Points

To enhance automated identification of Benggang landforms using multimodal data and a dual-stream feature-fusion network.
Developed DF-Net with two branches for digital orthophoto maps and digital elevation models.
Enhanced high-frequency boundary information using Canny-edge detection on DOM.
Incorporated terrain factors such as slope and aspect from DEM for morphological constraints.
Implemented a multiscale sparse attention fusion module to mitigate noise interference.
Used a zonal partitioning strategy for model evaluation in Anxi County.
Achieved 97.44% accuracy and 85.71% recall in independent tests.
Obtained an F1 score of 82.98%, outperforming existing CNN/transformer models.
Demonstrated enhanced classification stability with multibranch ensemble approach.
Validated the effectiveness of the 'multimodal feature enhancement + sparse attention fusion' strategy.

Abstract

Benggang (Benggang), a typical landform characterized by severe erosion and a geohazard in the red-soil hilly regions of southern China, is characterized by a fragmented texture, irregular boundaries, and high similarity to background objects such as bare soil and roads, which poses a dual challenge of “multiscale variability + strong noise” for automated identification at regional scales. To address insufficient information from a single modality and the limited representation of cross-scale features, this study proposes a dual-stream feature-fusion network (DF-Net) for multisource data consisting of a digital orthophoto map (DOM) and a digital elevation model (DEM). The method adopts ResNeSt50d as the backbone of the two branches: on the DOM side, a Canny-edge channel is stacked to enhance high-frequency boundary information; on the DEM side, derived terrain factors, including slope, aspect, curvature, and hillshade, are introduced to provide morphological constraints. In the cross-modal fusion stage, a multiscale sparse attention fusion module is designed, which acquires contextual information via multiwindow average pooling and suppresses noise interference through top-K sparsification. In the decision stage, a multibranch ensemble is employed to improve classification stability. Taking Anxi County, Fujian Province, as the study area, a coregistered dataset of GF-2 (1 m) DOM and ALOS (12.5 m) DEMs is constructed, and a zonal partitioning strategy is adopted to evaluate the model’s generalization ability. The experimental results show that DF-Net achieves 97.44% accuracy, 85.71% recall, and an 82.98% F1 score in the independent test zone, outperforming multiple mainstream CNN/transformer classification models. This study indicates that the strategy of “multimodal feature enhancement + sparse attention fusion” tailored to Benggang erosional landforms can significantly improve recognition performance under complex backgrounds, providing technical support for rapid Benggang surveys and governance-effectiveness assessments.

Remote Sensing Identification of Benggang Using a Two-Stream Network with Multimodal Feature Enhancement and Sparse Attention

Key Points

Abstract

Cite This Study