What question did this study set out to answer?

The research aims to develop a framework that effectively addresses multiple conflicting objectives in reinforcement learning.

April 17, 2026Open Access

MO-CoERL: Multi-objective cooperative evolutionary deep reinforcement learning

Key Points

The research aims to develop a framework that effectively addresses multiple conflicting objectives in reinforcement learning.
Introduced MO-CoERL framework combining cooperative coevolution with actor–critic learning.
Implemented a CAPQL backbone for gradient-based policy refinement.
Utilized a global Pareto archive for hypervolume-guided feedback.
Conducted experiments on four continuous-control MuJoCo benchmarks.
MO-CoERL achieved +41.72% higher Expected Utility Metric (EUM) on average compared to CAPQL.
Demonstrated +66.89% improvement in Hypervolume (HV) across benchmarks.
Notable improvements included up to 89.52% increase in HV on Hopper.
Achieved +173.15% HV improvement on Walker2d.

Abstract

Reinforcement learning systems often face tasks requiring the simultaneous optimization of multiple conflicting objectives, where traditional single-policy methods fail to capture the diversity of Pareto-optimal trade-offs. This paper introduces MO-CoERL, a Multi-Objective Cooperative Evolutionary Deep Reinforcement Learning framework that integrates cooperative coevolution with actor–critic learning to address this challenge. The proposed method combines population-based evolutionary exploration with gradient-based policy refinement through a CAPQL backbone, while a global Pareto archive enables hypervolume-guided feedback and diversity maintenance. This cooperative mechanism allows MO-CoERL to achieve stable convergence, broad Pareto coverage, and improved generalization across objectives. Experiments on four continuous-control MuJoCo benchmarks (Hopper, Walker2d, Swimmer, and Ant) demonstrate that MO-CoERL outperforms CAPQL across most benchmarks in convergence speed and front quality. On average, it achieves +41.72% higher Expected Utility Metric (EUM) and a +66.89% improvement in Hypervolume (HV). Notably, MO-CoERL yields up to an 89.52% increase in HV on Hopper and +173.15% on Walker2d, highlighting its robustness in high-dimensional and unstable tasks. These results confirm that cooperative evolution effectively complements actor–critic learning, enhancing both policy diversity and Pareto convergence. MO-CoERL provides a scalable preference-conditioned MORL integrated with cooperative evolution, offering a robust foundation for advancing cooperative and population-based optimization frameworks. • Proposes MO-CoERL, a cooperative evolutionary multi-objective RL framework. • Integrates actor–critic learning with population-based evolutionary exploration. • Uses a global Pareto archive for hypervolume-guided selection and feedback. • Achieves on average +41.72% EUM and +66.89% Hypervolume gain over CAPQL on MuJoCo tasks. • Improves convergence speed, stability, and Pareto diversity across benchmarks.

Read Full Paperexternally

AI에게 질문

Bookmark

View Full Paper