Applying multi-agent reinforcement learning (MARL) to real-world scenarios is challenging because agents often need to adapt quickly to unexpected situations, including those rarely or never encountered in training. Recent methods for out-of-distribution generalization are unsuitable for applications on out-of-distribution tasks with limited communication, because they are typically restricted to centralized training or some specialized instances of distribution shifts. To address this limitation, we introduce the Unexpectedness Encoding Scheme, a new decentralized MARL algorithm in which agents communicate ‘‘unexpectedness,’’ the surprising aspects of the environment. In addition to sending their usual reward-driven messages, each agent predicts the next observation based on past experience and then compares this prediction with the actual outcome. The discrepancy between the two is encoded as a message, enabling agents to adapt more effectively to sudden or extreme changes. Experimental results on multi-agent cooperative tasks demonstrate that our method adapts robustly to both dynamically changing training environments and previously unseen out-of-distribution scenarios.
Lee et al. (Thu,) studied this question.