What question did this study set out to answer?

This research aims to develop an efficient algorithm for policy adaptation in non-stationary environments within multi-agent systems.

May 29, 2026Open Access

Mixture of orthogonal experts: a novel approach to multi-agent fast policy adaptation

Key Points

This research aims to develop an efficient algorithm for policy adaptation in non-stationary environments within multi-agent systems.
Proposed algorithm MOFA integrates mixture of experts and value function decomposition.
Utilized the Gram-Schmidt process to maintain independence in expert subspaces.
Maximized mutual information through a variational lower bound, enhancing inference of global states.
MOFA algorithm outperforms several state-of-the-art algorithms in multi-task learning.
Demonstrated significant improvements in zero-shot generalization capabilities in competitive environments.

Abstract

Cooperative multi-agent reinforcement learning (MARL) has been widely applied in various complex decision-making domains due to its exceptional coordination capabilities. However, existing methods primarily focus on single-task scenarios or fixed-policy opponents (in competitive settings), making them less effective in non-stationary environments where tasks or opponent policies dynamically change. In this paper, we propose an algorithm termed MOFA, a novel method based on the centralized training and decentralized execution (CTDE) framework. By combining mixture of experts (MoE) and value function decomposition, it achieves fast policy adaptation in partially observable environments. Specifically, we integrate a shared-parameter MoE module into agent networks. Gram-Schmidt process is utilized to maintain the independence of expert subspaces, facilitating the extraction of transferable policy skills across diverse tasks. To enhance activation efficiency in expert modules, we use sparsemax to produce sparse probability distributions, ensuring only a few relevant experts are active at once. Since partial observability induces an information bottleneck, we maximize mutual information (MI) between local and global information as a solution. This is formalized through the optimization of a variational lower bound on the MI, which enhances decentralized agents’ capability to infer global state features from limited local percepts. Experimental results demonstrate that, in two typical competitive environments, the MOFA algorithm exhibits significant advantages over multiple state-of-the-art algorithms in both multi-task learning and zero-shot generalization capabilities.

Bookmark

View Full Paper

Cite This Study

Fu et al. (Tue,) studied this question.

synapsesocial.com/papers/6a192cf8fab5b468c4415c78 https://doi.org/https://doi.org/10.1007/s40747-026-02331-2

Bookmark

View Full Paper