The reliable operation of unmanned aerial vehicles (UAVs) in low-altitude economies requires robust obstacle avoidance, yet unimodal sensing fails under extreme lighting or weather. This paper presents a real-time obstacle avoidance system based on multimodal adaptive fusion of photoelectric and nano-radar sensors within a Transformer architecture. The system employs an end-to-end design with dual-stream heterogeneous feature extraction. A modified YOLOv5s processes photoelectric images for semantic features, while an adapted PointNet handles nano-radar point clouds for spatial geometry. A cross-modal multi-head selfattention mechanism dynamically fuses these features, overcoming the limitations of manually predefined modality weights. This design leverages the complementary nature of photoelectric sensors (high-resolution texture) and nano-radar (penetrating capability and precise depth), addressing nanoscale-level positioning challenges in dynamic environments. Experimental results on a custom Unity 3D dataset demonstrate that the system achieves a mean average precision (mAP) of 95.8% under ideal conditions. Notably, performance degradation under extreme interference (glare, backlight, rain, fog) is constrained to under 6%, compared to over 30% for unimodal systems. The end-to-end response latency is 32.6 ms on an NVIDIA Jetson Xavier NX edge device, with a 99.2% average obstacle avoidance success rate. By enabling deep feature interaction and dynamic adaptive weighting, the proposed system significantly enhances environmental robustness and realtime perception, providing a reliable hardware-software co-design solution for autonomous UAV navigation in complex low-altitude airspace.
Qu et al. (Thu,) studied this question.