Drivable area (DA) detection in unstructured off-road environments remains challenging for unmanned ground vehicles (UGVs) due to limited field-of-view, persistent occlusions, and the inherent limitations of individual sensors. While existing fusion approaches combine aerial and ground perspectives, they often struggle with misaligned spatiotemporal viewpoints, dynamic environmental changes, and ineffective feature integration, particularly at intersections or under long-range occlusion. To address these issues, this paper proposes a cooperative air–ground perception framework based on multi-source data fusion. Our three-stage system first introduces DynCoANet, a semantic segmentation network incorporating directional strip convolution and connectivity attention to extract topologically consistent road structures from UAV imagery. Second, an enhanced particle filter with semantic road constraints and diversity-preserving resampling achieves robust cross-view localization between UAV maps and UGV LiDAR. Finally, a distance-adaptive fusion transformer (DAFT) dynamically fuses UAV semantic features with LiDAR BEV representations via confidence-guided cross-attention, balancing geometric precision and semantic richness according to spatial distance. Extensive evaluations demonstrate the effectiveness of our approach: on the DeepGlobe road extraction dataset, DynCoANet attains an IoU of 61.14%; cross-view localization on KITTI sequences reduces average position error by approximately 10%; and DA detection on OpenSatMap outperforms Grid-DATrNet by 8.42% in accuracy for large-scale regions (400 m × 400 m). Real-world experiments with a coordinated UAV-UGV platform confirm the framework’s robustness in occlusion-heavy and geometrically complex scenarios. This work provides a unified solution for reliable DA perception through tightly coupled cross-modal alignment and adaptive fusion.
Zhang et al. (Tue,) studied this question.