Knowledge distillation (KD) plays a crucial role in reducing computational costs, accelerating inference, and improving model generalization. However, when there is a significant capacity gap between teacher and student networks, KD often struggles to transfer knowledge effectively, and robustness under distribution shifts remains a major challenge. To address these issues, we propose learning-to-learn knowledge distillation (L2L-KD), a dynamic temperature-controlled KD framework that progressively increases learning difficulty, mimicking how human learners advance from basic to complex concepts. To further enhance robustness and generalization, we introduce a counterfactual data augmentation technique that leverages the Metropolis–Hastings algorithm to generate fluent and semantically coherent out-of-domain (OOD) samples. We evaluate L2L-KD across in-domain, OOD, and adversarial scenarios, and the results show that it consistently outperforms existing KD approaches while substantially improving robustness. Moreover, building upon this foundation, we extend the core learning philosophy to a new unsupervised cross-domain framework, demonstrating that the dynamic distillation principles of L2L-KD can naturally generalize to broader domain adaptation tasks.
Xiang et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: