To address the escalating security challenges posed by unauthorized Unmanned Aerial Vehicles, this paper presents a Sim2real physics-informed audio–visual fusion simulation platform designed to enhance Counter-Unmanned Aerial Vehicle detection and tracking performance. The proposed method integrates two complementary sensing pipelines: a physics-based acoustic localization system utilizing Time Difference of Arrival principles and a deep learning-driven visual detection framework. To ensure robust surveillance against non-cooperative targets, these pipelines are not only fused through strict spatiotemporal synchronization but also mutually reinforce each other—acoustic data guides visual attention in low-visibility scenarios typical of adversarial intrusions, while visual detections refine acoustic parameter estimation. Building upon prior work in multi-modal perception, we extend the framework to dynamic environments characterized by configurable visual obstructions, including smoke and fog, which frequently compromise conventional optical anti-drone systems. Experiments demonstrate that the fusion system progressively adapts to degraded visual conditions, extending tracking continuity from approximately 50% coverage under vision-only operation to near-continuous target awareness, with a moderate trade-off in average angular precision when acoustic-only segments are included. Physical validation with quadrotor Unmanned Aerial Vehicles confirms the platform’s capability to bridge simulation-to-reality gaps. Our results highlight the system’s robustness against sensor degradation and its potential to accelerate the development of resilient multisensor Counter-Unmanned Aerial Vehicle systems while reducing dependency on costly field testing.
Nian et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: