March 3, 2026Open Access

Cooperative Air–Ground Perception Framework for Drivable Area Detection Using Multi-Source Data Fusion

Key Points

Drivable area detection achieves up to 61.14% Intersection over Union (IoU) on UAV datasets, showing significant advances in perceptual accuracy.
Cross-view localization reduces average position error by about 10% when aligning UAV maps with UGV LiDAR data, enhancing navigational reliability.
The system utilizes a three-stage method, integrating DynCoANet for semantic segmentation and a particle filter for robust localization between views.
Real-world testing confirms the framework's capabilities in complex scenarios with persistent occlusions, indicating its practical effectiveness.

Abstract

Drivable area (DA) detection in unstructured off-road environments remains challenging for unmanned ground vehicles (UGVs) due to limited field-of-view, persistent occlusions, and the inherent limitations of individual sensors. While existing fusion approaches combine aerial and ground perspectives, they often struggle with misaligned spatiotemporal viewpoints, dynamic environmental changes, and ineffective feature integration, particularly at intersections or under long-range occlusion. To address these issues, this paper proposes a cooperative air–ground perception framework based on multi-source data fusion. Our three-stage system first introduces DynCoANet, a semantic segmentation network incorporating directional strip convolution and connectivity attention to extract topologically consistent road structures from UAV imagery. Second, an enhanced particle filter with semantic road constraints and diversity-preserving resampling achieves robust cross-view localization between UAV maps and UGV LiDAR. Finally, a distance-adaptive fusion transformer (DAFT) dynamically fuses UAV semantic features with LiDAR BEV representations via confidence-guided cross-attention, balancing geometric precision and semantic richness according to spatial distance. Extensive evaluations demonstrate the effectiveness of our approach: on the DeepGlobe road extraction dataset, DynCoANet attains an IoU of 61.14%; cross-view localization on KITTI sequences reduces average position error by approximately 10%; and DA detection on OpenSatMap outperforms Grid-DATrNet by 8.42% in accuracy for large-scale regions (400 m × 400 m). Real-world experiments with a coordinated UAV-UGV platform confirm the framework’s robustness in occlusion-heavy and geometrically complex scenarios. This work provides a unified solution for reliable DA perception through tightly coupled cross-modal alignment and adaptive fusion.

Cooperative Air–Ground Perception Framework for Drivable Area Detection Using Multi-Source Data Fusion

Key Points

Abstract

Cite This Study