Cooperative multi‐agent reinforcement learning (CMARL) has become a powerful paradigm for applications in autonomous driving, smart grids, and distributed robotics. However, its increasing adoption in safety‐critical scenarios raises severe concerns about security vulnerabilities, particularly backdoor attacks, where malicious behaviors are embedded and activated only under specific triggers. This paper presents TReS‐BD, a trigger‐aware reward shaping framework that systematically exploits CMARL vulnerabilities at the reward level. By formulating backdoor manipulation as a Bayes‐Adaptive Markov Decision Process (BAMDP), TReS‐BD treats trigger conditions as latent variables and integrates KL‐regularized policy optimization to maintain nominal task performance while inducing adversarial behaviors under triggers. Extensive experiments on standard benchmarks, including multi‐agent particle environments (MPE) and StarCraft Multi‐Agent Challenge (SMAC), with representative algorithms MADDPG and MAPPO, demonstrate that TReS‐BD achieves over 90% attack success with less than 1% training data poisoning, while maintaining normal performance and evading current detection mechanisms. Furthermore, a systematic analysis of existing defenses reveals their ineffectiveness against such reward‐level attacks, underscoring a critical blind spot in current CMARL security frameworks. Our findings highlight the urgent need for robust defense strategies and provide new insights into safeguarding cooperative multi‐agent systems against stealthy and efficient backdoor threats.
Zhu et al. (Thu,) studied this question.