ABSTRACT UAV‐based visible‐thermal object detection is crucial for all‐weather aerial perception. However, prevailing methods that train separate models for visible, thermal, and fusion tasks incur significant computational redundancy. Moreover, under the typical UAV perspective, objects often appear with arbitrary orientations, making oriented bounding boxes more suitable than horizontal ones for accurate localization. To address both the efficiency challenge and the need for precise orientation‐aware detection, we propose TUMODet, a unified multi‐task oriented detection framework. Built upon the spatial transform decoupling (STD) detector, TUMODet incorporates task‐token prompt embedding and static weighted fusion, enabling a single model to dynamically handle three detection tasks (visible, thermal, and fused modalities). Comprehensive experiments on the DroneVehicle dataset demonstrate that our unified framework significantly improves parameter efficiency and resource utilization while achieving competitive performance across all tasks, particularly in the challenging fusion scenario. The results validate TUMODet as an efficient and practical solution for unified multimodal detection on resource‐constrained UAV platforms.
Sun et al. (Thu,) studied this question.