What question did this study set out to answer?

The aim is to improve multimodal object detection efficiency while maintaining performance on UAV edge devices.

February 2, 2026Open Access

SAMKD: A Hybrid Lightweight Algorithm Based on Selective Activation and Masked Knowledge Distillation for Multimodal Object Detection

Key Points

The aim is to improve multimodal object detection efficiency while maintaining performance on UAV edge devices.
Introduced SAMKD that combines selective activation pruning and masked knowledge distillation.
Developed a selective activation network pruning model to reduce computational redundancy.
Implemented a masked binary classification strategy to enhance feature recognition consistency.
Constructed a multimodal UAV aerial imagery dataset called M2UD-18K for testing.
SAMKD achieves a favorable trade-off between detection accuracy and speed on both M2UD-18K and DroneVehicle datasets.
Demonstrated stability in accuracy when employing the combined methods.
Effectively mitigated foreground-background imbalance in UAV aerial imagery.

Abstract

Multimodal object detection is currently a research hotspot in computer vision. However, the fusion of visible and infrared modalities inevitably increases computational complexity, making most high-performance detection models difficult to deploy on resource-constrained UAV edge devices. Although pruning and knowledge distillation are widely used for model compression, applying them independently often leads to an unstable accuracy–efficiency trade-off. Therefore, this paper proposes a hybrid lightweight algorithm named SAMKD, which combines selective activation pruning with masked knowledge distillation in a staged manner to improve efficiency while maintaining detection performance. Specifically, the selective activation network pruning model (SAPM) first reduces redundant computation by dynamically adjusting network weights and the activation state of input data to generate a lightweight student network. Then, the mask binary classification knowledge distillation (MBKD) strategy is introduced to compensate for this degradation by guiding the student network to recover missing representation patterns under masked feature learning. Moreover, MBKD reformulates classification logits into multiple foreground–background binary mappings, effectively alleviating the severe foreground–background imbalance commonly observed in UAV aerial imagery. This paper constructs a multimodal UAV aerial imagery object detection dataset, M2UD-18K, which includes 9 types of targets and over 18,000 pairs. Extensive experiments show that SAMKD performs well on the self-constructed M2UD-18K dataset, as well as the public DroneVehicle dataset, achieving a favorable trade-off between detection accuracy and detection speed.

Read Full Paperexternally

Demander à l'IA

Bookmark

View Full Paper