What question did this study set out to answer?

The aim is to develop a robust detection system for UAVs using both RGB and thermal infrared imaging to enhance performance under challenging conditions.

January 21, 2026Open Access

DCAM-DETR: Dual Cross-Attention Mamba Detection Transformer for RGB–Infrared Anti-UAV Detection

Key Points

The aim is to develop a robust detection system for UAVs using both RGB and thermal infrared imaging to enhance performance under challenging conditions.
Proposed DCAM-DETR framework integrates RGB and thermal infrared modalities.
Utilized a MobileMamba backbone with selective state space models for long-range dependency modeling.
Implemented Cross-Dimensional Attention and Cross-Path Attention modules for enhanced intermodal correlation.
Designed an Adaptive Feature Fusion Module to dynamically adjust the contributions of multimodal features.
Introduced a Dual-Attention Decoupling Module to improve detection discrimination for small targets.
Achieved 94.7% mAP@0.5 and 78.3% mAP@0.5:0.95 on Anti-UAV300 dataset.
Demonstrated state-of-the-art performance compared to existing methods.
Validated framework's generalization across FLIR-ADAS and KAIST datasets.

Abstract

The proliferation of unmanned aerial vehicles (UAVs) poses escalating security threats across critical infrastructures, necessitating robust real-time detection systems. Existing vision-based methods predominantly rely on single-modality data and exhibit significant performance degradation under challenging scenarios. To address these limitations, we propose DCAM-DETR, a novel multimodal detection framework that fuses RGB and thermal infrared modalities through an enhanced RT-DETR architecture integrated with state space models. Our approach introduces four innovations: (1) a MobileMamba backbone leveraging selective state space models for efficient long-range dependency modeling with linear complexity O(n); (2) Cross-Dimensional Attention (CDA) and Cross-Path Attention (CPA) modules capturing intermodal correlations across spatial and channel dimensions; (3) an Adaptive Feature Fusion Module (AFFM) dynamically calibrating multimodal feature contributions; and (4) a Dual-Attention Decoupling Module (DADM) enhancing detection head discrimination for small targets. Experiments on Anti-UAV300 demonstrate state-of-the-art performance with 94.7% mAP@0.5 and 78.3% mAP@0.5:0.95 at 42 FPS. Extended evaluations on FLIR-ADAS and KAIST datasets validate the generalization capacity across diverse scenarios.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper