April 19, 2024Open Access

Dynamic Temperature Knowledge Distillation

Key Points

Key points are not available for this paper at this time.

Abstract

Temperature plays a pivotal role in moderating label softness in the realm of knowledge distillation (KD). Traditional approaches often employ a static temperature throughout the KD process, which fails to address the nuanced complexities of samples with varying levels of difficulty and overlooks the distinct capabilities of different teacher-student pairings. This leads to a less-than-ideal transfer of knowledge. To improve the process of knowledge propagation, we proposed Dynamic Temperature Knowledge Distillation (DTKD) which introduces a dynamic, cooperative temperature control for both teacher and student models simultaneously within each training iterafion. In particular, we proposed "sharpness" as a metric to quantify the smoothness of a model's output distribution. By minimizing the sharpness difference between the teacher and the student, we can derive sample-specific temperatures for them respectively. Extensive experiments on CIFAR-100 and ImageNet-2012 demonstrate that DTKD performs comparably to leading KD techniques, with added robustness in Target Class KD and None-target Class KD scenarios. The code is available at https: //github. com/JinYu1998/DTKD.

Read Full Paperexternally

Ask AI

Helpful

Bookmark

View Full Paper

Cite This Study

Wei et al. (Fri,) studied this question.

synapsesocial.com/papers/68e6e65fb6db643587661a57 https://doi.org/https://doi.org/10.48550/arxiv.2404.12711