Lightweight network architectures are crucial for enabling autonomous and intelligent monitoring with unmanned aerial vehicles (UAVs). However, conventional state-of-the-art detectors often rely on computationally intensive stacks of convolutional operations to improve detection accuracy, resulting in substantial memory overhead and considerable computational redundancy. To address these limitations, we propose FFKD-Net, a lightweight object detection framework designed for efficient and accurate remote-sensing object detection. First, we introduce an efficient feature enhancement network (EFEN) by integrating the MBConv module from MobileNetV3 with an embedded squeeze-and-excitation (SE) channel attention mechanism, thereby improving feature representation with limited computational cost. Second, we develop a multi-scale feature fusion strategy (MFFS) that combines channel-wise mapping with upsampling-based spatial alignment, enabling high-fidelity cross-layer feature fusion while alleviating the loss of fine-grained details. Third, we propose a hierarchical knowledge distillation module (HKDM), which employs soft-label supervision to refine classification decision boundaries and imposes regression consistency constraints to enhance localization robustness. Extensive experiments on two benchmark remote-sensing datasets, VisDrone and DIOR, demonstrate the effectiveness and efficiency of the proposed method. On VisDrone, FFKD-Net achieves an mAP₅₀ of 47. 7% with only 3. 0M parameters; on DIOR, it attains an mAP of 70. 8%, further validating its strong accuracy–efficiency trade-off.
Chen et al. (Sun,) studied this question.