What does this research mean for the field?

The proposed multimodal fusion attention network achieves 84.9% mean Average Precision (mAP) for real-time obstacle detection and avoidance in low-altitude unmanned aerial vehicles. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The aim is to develop a robust system for real-time obstacle detection and avoidance using multimodal data.

February 24, 2026Open Access

Multimodal Fusion Attention Network for Real-Time Obstacle Detection and Avoidance for Low-Altitude Aircraft

Key Points

The aim is to develop a robust system for real-time obstacle detection and avoidance using multimodal data.
Integration of visual imagery and LiDAR data
Implementation of a bidirectional cross-modal attention mechanism
Use of adaptive weighting based on sensor confidence
Incorporation of gated fusion units and multi-scale feature pyramids
Creation of a hierarchical avoidance decision framework
Achieved 84.9% mean Average Precision (mAP)
Maintained 47.3 FPS on GPU hardware and 23.6 FPS on embedded platforms
Cross-modal attention significantly improved performance according to ablation studies

Abstract

The rapid expansion of low-altitude unmanned aerial vehicles demands robust obstacle detection and avoidance systems capable of operating under diverse environmental conditions. This paper proposes a multimodal fusion attention network that integrates visual imagery and Light Detection and Ranging (LiDAR) point cloud data for real-time obstacle perception. The architecture incorporates a bidirectional cross-modal attention mechanism that learns dynamic correspondences between heterogeneous sensor modalities, enabling adaptive feature integration based on contextual reliability. An adaptive weighting component automatically modulates modal contributions according to estimated sensor confidence under varying environmental conditions. The network further employs gated fusion units and multi-scale feature pyramids to ensure comprehensive obstacle representation across different distances. A hierarchical avoidance decision framework translates detection outputs into executable control commands through threat assessment and graduated response strategies. Experimental evaluation on both public benchmarks and a purpose-collected low-altitude obstacle dataset demonstrates that the proposed method achieves 84.9% mean Average Precision (mAP) while maintaining 47.3 frames per second (FPS) on Graphics Processing Unit (GPU) hardware and 23.6 FPS on embedded platforms. Ablation studies confirm the contribution of each architectural component, with cross-modal attention providing the most substantial performance improvement.

Read Full Paperexternally

KI fragen

Bookmark

View Full Paper