What question did this study set out to answer?

This research aims to enhance small object detection in aerial imagery by addressing challenges like target sparsity and environmental interference.

June 3, 2026Open Access

CFP-DETR: Collaborative Feature Purification Network with Spatial Alignment for Aerial Small Object Detection

Key Points

This research aims to enhance small object detection in aerial imagery by addressing challenges like target sparsity and environmental interference.
Developed the Collaborative Feature Purification Detection Transformer (CFP-DETR) and Efficient Lightweight CFP-DETR (EL-CFP-DETR) for improved detection.
Implemented a Global Context Denoising Module (GCDM) to reduce environmental noise and enhance salient features.
Integrated an Adaptive Cross-scale Feature Alignment (ACFA) module to resolve spatial misalignment and recover shallow details.
CFP-DETR increases AP50 by 1.64% and APSval by 4.03% on the SeaDronesSee dataset compared to baseline.
EL-CFP-DETR reduces parameters by 18% to 16.4M while maintaining detection accuracy, achieving 42.8 FPS.
CFP-DETR achieves an inference speed of 37.72 FPS, a 31.2% improvement over RT-DETR.

Abstract

Object detection in aerial imagery faces extreme target sparsity and high-intensity environmental interference, causing weak targets to be submerged in background clutter. To address this, we propose a Collaborative Feature Purification Detection Transformer (CFP-DETR), which reconstructs discriminative target representations through a collaborative feature purification mechanism. Specifically, the Global Context Denoising Module (GCDM) first suppresses environmental noise at the semantic level to enhance target saliency. The purified features are then fused across scales through an Adaptive Cross-scale Feature Alignment (ACFA) module, which resolves spatial misalignment that otherwise dilutes small-object features during multi-level interaction. Concurrently, a Fine-Grained Detail Injection Module (FGDIM) recovers shallow high-resolution details and injects them into the semantic flow, compensating for information loss caused by progressive downsampling. Together, these modules denoise, align, and recover features to counteract submergence at different stages. Additionally, an efficient lightweight variant, Efficient Lightweight CFP-DETR (EL-CFP-DETR), reconstructs the backbone with partial convolution and structural re-parameterization to improve efficiency while maintaining competitive detection accuracy. Extensive experiments across five datasets validate the effectiveness of this collaborative design. On the SeaDronesSee dataset, CFP-DETR increases AP50 and APSval by 1.64% and 4.03% over the baseline, while EL-CFP-DETR reduces parameters by 18% to 16.4M and GFLOPs by 15% to 48.3, reaching 42.8 FPS. Notably, CFP-DETR achieves an inference speed of 37.72 FPS, a 31.2% improvement over the baseline Real-Time Detection Transformer (RT-DETR).

CFP-DETR: Collaborative Feature Purification Network with Spatial Alignment for Aerial Small Object Detection

Key Points

Abstract

Cite This Study