Key points are not available for this paper at this time.
Abstract This paper presents a Cosine Similarity-Based Knowledge Distillation (CSKD) for robust, lightweight object detectors. Knowl-edge Distillation (KD) has been effective in enhancing the performance of compact models in image classification by leveragingdeep CNN models. However, the complex and multifaceted nature of object detection, characterized by its modular designand multitasking requirements, poses significant challenges for traditional KD techniques. These challenges are furthercompounded by the conventional reliance on the Mean Squared Error (MSE) loss function and the limited application ofenhanced feature representations to the training phase. Addressing these limitations, the proposed CSKD method combinescosine similarity guidance with MSE loss to facilitate a more effective knowledge transfer from the teacher model to thestudent model. This is achieved by distilling both intermediate features and prediction outputs, aided by an assistant predictionbranch designed to learn directly from the teacher’s predictions. This dual-faceted distillation strategy enables the studentmodel to better mimic the teacher model’s behavior, leading to improved performance. The proposed method demonstratesversatility and robustness across various object detector architectures without the need for additional feature enhancementlayers during training. Notably, employing ResNet-50 as the teacher model and ResNet-18 as the student model, we achievenew benchmarks in KD for object detection across several architectures, including Faster-RCNN, RetinaNet, FCOS, and GFL,with respective mAP scores of 36.6, 35.2, 35.9, and 38.9. These results highlights the effectiveness of CSKD in advancing thestate-of-the-art in KD for object detection, offering a compelling solution to the challenges previously faced by traditional KDmethods in this domain. The code of the proposed CSKD is available at https://github.com/swkdn16/CSKD
Park et al. (Mon,) studied this question.