What question did this study set out to answer?

This research aims to enhance the efficiency of QMIX in complex multi-agent environments.

December 11, 2025

Hypernetwork parameter optimization of QMIX based on differential evolution

Key Points

This research aims to enhance the efficiency of QMIX in complex multi-agent environments.
Proposed DE-QMIX for hypernetwork parameter optimization using differential evolution.
Involves mutation, crossover, and selection of population individuals for parameter tuning.
Utilizes gradient descent to update hypernetwork parameters based on performance.
DE-QMIX shows a higher average winning rate compared to QMIX and other optimization methods.
Improved decision quality leads to better action selection and an overall increase in global reward.

Abstract

Q-value Mixing (QMIX) is a widely used algorithm for multi-agent reinforcement learning. However, multi-agent environments are quite complex and have high-dimensional action and state spaces, which leads to lower exploration efficiency and sparse global reward in the early stage of QMIX. To address the issue, we proposed an efficient hypernetwork parameters optimization method of QMIX based on differential evolution (DE-QMIX). DE-QMIX encodes the hypernetwork parameters of QMIX as population individuals, and obtains the best hypernetwork parameters by performing mutation, crossover, and selection operations on these individuals. The hypernetworks adjust parameters through the gradient descent method and feed the updated parameter information back to the current population to improve the overall efficiency of DE-QMIX. By optimizing the hypernetwork parameters, the joint action-value function Q t o t a l fitted from the mixing network can more accurately reflect the decision quality of the entire multi-agent system, which can guide the individual agent to reduce invalid or inefficient action selection during exploration and speed up the learning process of agents. The improvement of the Q t o t a l will guide the agent to choose better actions and improve the global reward. Our experiments on the StarCraft Multi-Agent Challenge (SMAC) platform have demonstrated that DE-QMIX achieves a higher average winning rate and global reward than QMIX and other existing approaches such as Multi-Agent Variational Exploration (MAVEN), Value-Decomposition Networks (VDN), and Joint Q-Function Transformation (QTRAN).

Mark Helpful

Bookmark

Relay