What question did this study set out to answer?

The aim is to enhance deep reinforcement learning by leveraging morphological symmetry in robotic systems for stable and robust learning.

May 17, 2026Open Access

Morphological symmetry-aware generalized policy network for deep reinforcement learning

Key Points

The aim is to enhance deep reinforcement learning by leveraging morphological symmetry in robotic systems for stable and robust learning.
Proposed a symmetry-assisted DRL framework for morphologically symmetric robots.
Modeled the environment as a symmetric Markov decision process (MDP).
Developed a symmetric PPO objective with a coupled importance-sampling ratio.
Experimental results show improved performance on most symmetric tasks compared to existing methods.
Maintained comparable performance to standard PPO on asymmetric tasks.
Provided a stable and robust learning mechanism for robotic control.

Abstract

Exploiting the morphological symmetry of robotic systems, such as humanoid and quadruped robots, is a promising direction for improving robot learning. In deep reinforcement learning (DRL) for robot control, prior studies have leveraged such symmetry to improve learning efficiency through data augmentation, equivariant multilayer perceptrons (EMLPs), and multi-agent reinforcement learning (MARL) formulations. However, DRL training is inherently unstable, as the data distribution strongly depends on exploration, which is driven by stochasticity in the environment. To address this issue, we propose a symmetry-assisted, general-purpose DRL framework for morphologically symmetric robots that enables stable and robust learning. The framework models the environment as a symmetric Markov decision process (MDP) and constructs a full-body policy from a single-sided base policy using symmetry operators. We further propose a symmetric PPO objective with a coupled importance-sampling ratio. This objective aligns the policy optimization process with the imposed symmetry and serves as a principled alternative to MAPPO-style multi-agent formulations. Experimental results demonstrate that the proposed method outperforms existing approaches on most symmetric tasks, while still maintaining performance comparable to or better than standard PPO on asymmetric tasks, where symmetry is less directly exploitable.

KI fragen

Bookmark

View Full Paper