Knowledge Distillation (KD) is a machine learning technique in which a compact student model learns to replicate the performance of a larger teacher model by mimicking its output predictions. Multi-Teacher Knowledge Distillation extends this paradigm by aggregating knowledge from multiple teacher models to improve generalization and robustness. However, effectively integrating outputs from diverse teachers, especially in the presence of noise or conflicting predictions, remains a key challenge. In this work, we propose a Multi-Round Parallel Multi-Teacher Distillation (MPMTD) that systematically explores and combines multiple aggregation techniques. Specifically, we investigate aggregation at different levels, including loss-based and probability-distribution-based fusion. Our framework applies different strategies across distillation rounds, enabling adaptive and synergistic knowledge transfer. Through extensive experimentation, we analyze the strengths and weaknesses of individual aggregation methods and demonstrate that strategic sequencing across rounds significantly outperforms static approaches. Notably, we introduce the Byzantine-Resilient Probability Distribution aggregation method applied for the first time in a KD context, which achieves state-of-the-art performance, with an accuracy of 99.29% and an F1-score of 99.27%. We further identify optimal configurations in terms of the number of distillation rounds and the ordering of aggregation strategies, balancing accuracy with computational efficiency. Our contributions include (i) the introduction of advanced aggregation strategies into the KD setting, (ii) a systematic evaluation of their performance, and (iii) practical recommendations for real-world deployment. These findings have significant implications for distributed learning, edge computing, and IoT environments, where efficient and resilient model compression is essential.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ahmed Hamdi
Hassan Noura
Joseph Azar
Applied System Innovation
Franche-Comté Électronique Mécanique Thermique et Optique - Sciences et Technologies
Building similarity graph...
Analyzing shared references across papers
Loading...
Hamdi et al. (Tue,) studied this question.
www.synapsesocial.com/papers/68dd91d5fe798ba2fc499120 — DOI: https://doi.org/10.3390/asi8050146