Value factorization methods have become a standard tool for cooperative multi-agent reinforcement learning (MARL) in the centralized-training, decentralized-execution (CTDE) setting. QMIX (a monotonic mixing network for value factorization), in particular, constrains the joint action–value function to be a monotonic mixing of per-agent utilities, which guarantees consistency with individual greedy policies but can severely limit expressiveness on tasks with non-monotonic agent interactions. This work revisits this design choice and proposes Relaxed Monotonic QMIX (R-QMIX), a simple regularized variant of QMIX that encourages but does not strictly enforce the monotonicity constraint. R-QMIX removes the sign constraints on the mixing network weights and introduces a differentiable penalty on negative partial derivatives of the joint value with respect to each agent’s utility. This preserves the computational benefits of value factorization while allowing the joint value to deviate from strict monotonicity when beneficial. R-QMIX is implemented in a standard PyMARL (an open-source MARL codebase) and evaluated on the StarCraft Multi-Agent Challenge (SMAC). On a simple map (3m), R-QMIX matches the asymptotic performance of QMIX while learning substantially faster. On more challenging maps (MMM2, 6h vs. 8z, and 27m vs. 30m), R-QMIX significantly improves both sample efficiency and final win rate (WR), for example increasing the final-quarter mean win rate from 42.3% to 97.1% on MMM2, from 0.0% to 57.5% on 6h vs. 8z, and from 58.0% to 96.6% on 27m vs. 30m. These results suggest that soft monotonicity regularization is a practical way to bridge the gap between strictly monotonic value factorization and fully unconstrained joint value functions. A further comparison against QTRAN (Q-value transformation), a more expressive value factorization method, shows that R-QMIX achieves higher and more reliably convergent win rates on the challenging SMAC maps considered.
O’Brien et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: