The practice of knowledge distillation, wherein pre-trained models - termed "teachers" - serve to enhance the performance of subsequent "student" models, has seen a surge in popularity. With a lot of pre-trained models available, there is a pressing need to investigate effective mechanisms for utilizing these resources optimally. This paper tackles two main challenges inherent in developing such mechanisms. The initial challenge lies in the effective selection of the most advantageous models from a vast pool, given that exhaustively testing each pre-trained model is impractical. The second challenge stems from the diversity of pre-trained models, as it is common for a pre-trained model to have a different label space than that of the current task. This necessitates the development of a universal model reuse approach capable of integrating any pre-trained model, regardless of its unknown label set. To address these issues, we introduce a dual-phase framework named "Selective Cross-Label Distillation." The first phase, termed model assessment, evaluates the semantic similarity between a potential pre-trained model and the target model through optimal transport. By determining the transportation cost, we pinpoint candidate models with lower costs to serve as our source models. The second phase, knowledge reuse, concentrates on minimizing the transportation cost between the chosen source models and the target model. This dual-phase approach enables efficient model selection and bridges the semantic gap between pre-trained models and the target task, leading to enhanced performance. The effectiveness of our framework in model selection and knowledge reuse is validated by experimental results.
Lu et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: