August 25, 2025Open Access

Evolutionary Sampling for Knowledge Distillation in Multi-Agent Reinforcement Learning

Key Points

The evolutionary sampling method notably reduces performance gaps between teacher and student modules in multi-agent settings, enhancing efficiency.
Experimental results show improvements in Q-value accuracy, with the method achieving superior performance in the StarCraft Multi-Agent Challenge environment.
This study employs genetic algorithms to refine selective sampling strategies that optimize knowledge distillation in decentralized frameworks.
Findings support the potential of evolutionary methods to address inefficiencies in multi-agent reinforcement learning tasks.

Abstract

The Centralized Teacher with Decentralized Student (CTDS) framework is a multi-agent reinforcement learning (MARL) approach that utilizes knowledge distillation within the Centralized Training with Decentralized Execution (CTDE) paradigm. In this framework, a teacher module learns optimal Q-values using global observations and distills this knowledge to a student module that operates with only local information. However, CTDS has limitations including inefficient knowledge distillation processes and performance gaps between teacher and student modules. This paper proposes the evolutionary sampling method that employs genetic algorithms to optimize selective knowledge distillation in CTDS frameworks. Our approach utilizes a selective sampling strategy that focuses on samples with large Q-value differences between teacher and student models. The genetic algorithm optimizes adaptive sampling ratios through evolutionary processes, where the chromosome represent sampling ratio sequences. This evolutionary optimization discovers optimal adaptive sampling sequences that minimize teacher–student performance gaps. Experimental validation in the StarCraft Multi-Agent Challenge (SMAC) environment confirms that our method achieved superior performance compared to the existing CTDS methods. This approach addresses the inefficiency in knowledge distillation and performance gap issues while improving overall performance through the genetic algorithm.

Read Full Paperexternally

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper

Cite This Study

Jo et al. (Mon,) studied this question.

synapsesocial.com/papers/68af6216ad7bf08b1eae3936 https://doi.org/https://doi.org/10.3390/math13172734

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper