The Centralized Teacher with Decentralized Student (CTDS) framework is a multi-agent reinforcement learning (MARL) approach that utilizes knowledge distillation within the Centralized Training with Decentralized Execution (CTDE) paradigm. In this framework, a teacher module learns optimal Q-values using global observations and distills this knowledge to a student module that operates with only local information. However, CTDS has limitations including inefficient knowledge distillation processes and performance gaps between teacher and student modules. This paper proposes the evolutionary sampling method that employs genetic algorithms to optimize selective knowledge distillation in CTDS frameworks. Our approach utilizes a selective sampling strategy that focuses on samples with large Q-value differences between teacher and student models. The genetic algorithm optimizes adaptive sampling ratios through evolutionary processes, where the chromosome represent sampling ratio sequences. This evolutionary optimization discovers optimal adaptive sampling sequences that minimize teacher–student performance gaps. Experimental validation in the StarCraft Multi-Agent Challenge (SMAC) environment confirms that our method achieved superior performance compared to the existing CTDS methods. This approach addresses the inefficiency in knowledge distillation and performance gap issues while improving overall performance through the genetic algorithm.
Jo et al. (Mon,) studied this question.