What question did this study set out to answer?

The central aim is to improve detection and tracking performance for unauthorized UAVs using a novel simulation platform.

March 12, 2026Open Access

Audio–Visual Fusion Sim2Real Platform for Anti-UAV Detection and Tracking

Key Points

The central aim is to improve detection and tracking performance for unauthorized UAVs using a novel simulation platform.
Developed a physics-informed audio-visual fusion platform
Utilized acoustic localization based on Time Difference of Arrival
Implemented a deep learning-based visual detection framework
Fused sensory data through spatiotemporal synchronization
Tested in dynamic environments with visual obstructions like smoke and fog
Achieved nearly continuous target awareness under low-visibility conditions
Increased tracking continuity from 50% under vision-only to almost full coverage
Demonstrated robust performance despite sensor degradation
Reduced reliance on costly field testing by bridging simulation and reality

Abstract

To address the escalating security challenges posed by unauthorized Unmanned Aerial Vehicles, this paper presents a Sim2real physics-informed audio–visual fusion simulation platform designed to enhance Counter-Unmanned Aerial Vehicle detection and tracking performance. The proposed method integrates two complementary sensing pipelines: a physics-based acoustic localization system utilizing Time Difference of Arrival principles and a deep learning-driven visual detection framework. To ensure robust surveillance against non-cooperative targets, these pipelines are not only fused through strict spatiotemporal synchronization but also mutually reinforce each other—acoustic data guides visual attention in low-visibility scenarios typical of adversarial intrusions, while visual detections refine acoustic parameter estimation. Building upon prior work in multi-modal perception, we extend the framework to dynamic environments characterized by configurable visual obstructions, including smoke and fog, which frequently compromise conventional optical anti-drone systems. Experiments demonstrate that the fusion system progressively adapts to degraded visual conditions, extending tracking continuity from approximately 50% coverage under vision-only operation to near-continuous target awareness, with a moderate trade-off in average angular precision when acoustic-only segments are included. Physical validation with quadrotor Unmanned Aerial Vehicles confirms the platform’s capability to bridge simulation-to-reality gaps. Our results highlight the system’s robustness against sensor degradation and its potential to accelerate the development of resilient multisensor Counter-Unmanned Aerial Vehicle systems while reducing dependency on costly field testing.

Audio–Visual Fusion Sim2Real Platform for Anti-UAV Detection and Tracking

Key Points

Abstract

Cite This Study

Also Consider

Also Consider