Key points are not available for this paper at this time.
In cooperative multi-agent reinforcement learning, agents jointly optimize a centralized value function based on the rewards shared by all agents and learn decentralized policies through value function decomposition. Although such a learning framework is considered effective, estimating individual contribution from the rewards, which is essential for learning highly cooperative behaviors, is difficult. In addition, it becomes more challenging when reinforcement and punishment, help in increasing or decreasing the specific behaviors of agents, coexist because the processes of maximizing reinforcement and minimizing punishment can often conflict in practice. This study proposes a novel exploration scheme called multi-agent decomposed reward-based exploration (MuDE), which preferably explores the action spaces associated with positive sub-rewards based on a modified reward decomposition scheme, thus effectively exploring action spaces not reachable by existing exploration schemes. We evaluate MuDE with a challenging set of StarCraft II micromanagement and modified predator-prey tasks extended to include reinforcement and punishment. The results show that MuDE accurately estimates sub-rewards and outperforms state-of-the-art approaches in both convergence speed and win rates.
Building similarity graph...
Analyzing shared references across papers
Loading...
Byunghyun Yoo
Sungwon Yi
Hyunwoo Kim
Neural Networks
Electronics and Telecommunications Research Institute
Building similarity graph...
Analyzing shared references across papers
Loading...
Yoo et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68e5f84db6db64358758c4c9 — DOI: https://doi.org/10.1016/j.neunet.2024.106565