Key points are not available for this paper at this time.
In knowledge distillation (KD), a lightweight student model yields enhanced test accuracy by mimicking the behaviour of a pre-trained large model (teacher). However, the cumbersome teacher model often makes over-confident responses, resulting in poor generalization when presented with unseen data. Consequently, a student trained by such a teacher also inherits this problem. To mitigate this issue, in this paper, we present a new framework of KD dubbed coded knowledge distillation (CKD) in which the student is trained to mimic instead the behaviour of a coded teacher. Compared to the teacher in KD, the coded teacher in CKD has an additional adaptive encoding layer in the front, which adaptively encodes an input image into a compressed version (using JPEG encoding for instance) and then feeds the compressed input image to the pre-trained teacher. Comprehensive experimental results show the effectiveness of CKD over KD. In addition, we extend the deployment of a coded teacher to other knowledge transfer methods, showcasing its ability to enhance test accuracy across these methods.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ahmed H. Salamah
University of Waterloo
Shayan Mohajer Hamidi
Sharif University of Technology
En‐hui Yang
University of Waterloo
Pattern Recognition
University of Waterloo
Building similarity graph...
Analyzing shared references across papers
Loading...
Salamah et al. (Tue,) studied this question.
synapsesocial.com/papers/68e597d8b6db6435875327a6 — DOI: https://doi.org/10.1016/j.patcog.2024.110966
Synapse has enriched one closely related paper. Consider it for comparative context: