What question did this study set out to answer?

The study aims to address security vulnerabilities in cooperative multi-agent reinforcement learning, particularly focused on backdoor attacks.

March 10, 2026Open Access

TReS‐BD: Trigger‐Aware Reward Shaping for Efficient and Stealthy Backdoor Attacks in Cooperative Multi‐Agent RL

Key Points

The study aims to address security vulnerabilities in cooperative multi-agent reinforcement learning, particularly focused on backdoor attacks.
Developed TReS-BD framework for reward shaping in CMARL.
Formulated backdoor manipulation as a Bayes-Adaptive Markov Decision Process.
Integrated KL-regularized policy optimization to balance task performance and adversarial behaviors.
Conducted extensive experiments using multi-agent particle environments and StarCraft Multi-Agent Challenge.
Achieved over 90% attack success with less than 1% training data poisoning.
Maintained normal performance while evading detection mechanisms.
Identified existing defenses as ineffective against reward-level attacks.

Abstract

Cooperative multi‐agent reinforcement learning (CMARL) has become a powerful paradigm for applications in autonomous driving, smart grids, and distributed robotics. However, its increasing adoption in safety‐critical scenarios raises severe concerns about security vulnerabilities, particularly backdoor attacks, where malicious behaviors are embedded and activated only under specific triggers. This paper presents TReS‐BD, a trigger‐aware reward shaping framework that systematically exploits CMARL vulnerabilities at the reward level. By formulating backdoor manipulation as a Bayes‐Adaptive Markov Decision Process (BAMDP), TReS‐BD treats trigger conditions as latent variables and integrates KL‐regularized policy optimization to maintain nominal task performance while inducing adversarial behaviors under triggers. Extensive experiments on standard benchmarks, including multi‐agent particle environments (MPE) and StarCraft Multi‐Agent Challenge (SMAC), with representative algorithms MADDPG and MAPPO, demonstrate that TReS‐BD achieves over 90% attack success with less than 1% training data poisoning, while maintaining normal performance and evading current detection mechanisms. Furthermore, a systematic analysis of existing defenses reveals their ineffectiveness against such reward‐level attacks, underscoring a critical blind spot in current CMARL security frameworks. Our findings highlight the urgent need for robust defense strategies and provide new insights into safeguarding cooperative multi‐agent systems against stealthy and efficient backdoor threats.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Zhu et al. (Thu,) studied this question.

synapsesocial.com/papers/69af94da70916d39fea4bd93 https://doi.org/https://doi.org/10.1049/ise2/3609361

Bookmark

View Full Paper