What type of study is this?

This is a Experimental Study study.

October 31, 2025Open Access

Two-Layered Reward Reinforcement Learning in Humanoid Robot Motion Tracking

Key Points

Proposed two-layered reward reinforcement learning improves tracking accuracy in humanoid robot motion.
Improvements include 7.58% and 10.30% for upper-body and lower-body tracking accuracy, respectively.
Implemented an online optimization algorithm for adaptive reward shaping during training stages.
Results reveal enhanced learning robustness and potential for improved energy efficiency.

Abstract

In reinforcement learning (RL), reward function design is critical to the learning efficiency and final performance of agents. However, in complex tasks such as humanoid motion tracking, traditional static weighted reward functions struggle to adapt to shifting learning priorities across training stages, and designing a suitable shaping reward is problematic. To address these challenges, this paper proposes a two-layered reward reinforcement learning framework. The framework decomposes the reward into two layers: an upper-level goal reward that measures task completion, and a lower-level optimizing reward that includes auxiliary objectives such as stability, energy consumption, and motion smoothness. The key innovation lies in the online optimization of the lower-level reward weights via an online meta-heuristic optimization algorithm. This online adaptivity enables goal-conditioned reward shaping, allowing the reward structure to evolve autonomously without requiring expert demonstrations, thereby improving learning robustness and interpretability. The framework is tested on a gymnastic motion tracking problem for the Unitree G1 humanoid robot in the Isaac Gym simulation environment. The experimental results show that, compared to a static reward baseline, the proposed framework achieves 7.58% and 10.30% improvements in upper-body and lower-body link tracking accuracy, respectively. The resulting motions also exhibit better synchronization and reduced latency. The simulation results demonstrate the effectiveness of the framework in promoting efficient exploration, accelerating convergence, and enhancing motion imitation quality.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper