What question did this study set out to answer?

This paper aims to optimize the selection and integration of pre-trained models for enhanced student model performance.

June 8, 2026

Selecting and Distilling Cross-Label Models

Key Points

This paper aims to optimize the selection and integration of pre-trained models for enhanced student model performance.
Introduced a dual-phase framework named 'Selective Cross-Label Distillation'.
First phase assesses semantic similarity using optimal transport methods.
Second phase focuses on minimizing transportation costs between source and target models.
The framework effectively identified candidate models with lower transportation costs.
Efficient model selection bridged the semantic gap, improving performance metrics.
Experimental results validated the approach's efficacy in knowledge reuse and model selection.

Abstract

The practice of knowledge distillation, wherein pre-trained models - termed "teachers" - serve to enhance the performance of subsequent "student" models, has seen a surge in popularity. With a lot of pre-trained models available, there is a pressing need to investigate effective mechanisms for utilizing these resources optimally. This paper tackles two main challenges inherent in developing such mechanisms. The initial challenge lies in the effective selection of the most advantageous models from a vast pool, given that exhaustively testing each pre-trained model is impractical. The second challenge stems from the diversity of pre-trained models, as it is common for a pre-trained model to have a different label space than that of the current task. This necessitates the development of a universal model reuse approach capable of integrating any pre-trained model, regardless of its unknown label set. To address these issues, we introduce a dual-phase framework named "Selective Cross-Label Distillation." The first phase, termed model assessment, evaluates the semantic similarity between a potential pre-trained model and the target model through optimal transport. By determining the transportation cost, we pinpoint candidate models with lower costs to serve as our source models. The second phase, knowledge reuse, concentrates on minimizing the transportation cost between the chosen source models and the target model. This dual-phase approach enables efficient model selection and bridges the semantic gap between pre-trained models and the target task, leading to enhanced performance. The effectiveness of our framework in model selection and knowledge reuse is validated by experimental results.

AI에게 질문

Bookmark