What question did this study set out to answer?

This research aims to enhance long-horizon mobile manipulation by addressing the challenges of reward sparsity and hand-off reliability in reinforcement learning.

June 19, 2026Open Access

Motion Planning-Augmented Hierarchical Reinforcement Learning for Long-Horizon Mobile Manipulation

Puntos clave

This research aims to enhance long-horizon mobile manipulation by addressing the challenges of reward sparsity and hand-off reliability in reinforcement learning.
Developed a motion planning-augmented hierarchical reinforcement learning architecture.
Used a Semi-Markov Decision Process to decompose tasks into subtasks and generate collision-free reference trajectories.
Implemented a region-goal mechanism based on inverse kinematics for seamless transitions between subtasks.
Improved subtask success rate and sample efficiency compared to baseline methods across all six evaluated subtasks.
The benefits were amplified throughout the long-horizon task chain.
Achieved reliable hand-offs between consecutive subtasks, overcoming kinematic feasibility issues.

Resumen

Long-horizon mobile manipulation requires a robot to execute a sequence of heterogeneous subtasks such as navigation, picking, and articulated-object manipulation in indoor environments. Standard reinforcement learning suffers from reward sparsity and inefficient exploration in this setting, and hierarchical methods often fail at the hand-off between consecutive subtasks when the terminal state of one subtask is kinematically infeasible for the next. We propose a motion planning-augmented hierarchical reinforcement learning architecture to resolve the fundamental trade-offs between sample efficiency and hand-off reliability in long-horizon mobile manipulation. The mission is decomposed into subtasks via a Semi-Markov Decision Process; within each subtask, a collision-free reference trajectory generated by RRT* in the full joint configuration space is embedded into the reward as a per-step shaping signal; and a region-goal mechanism, defined analytically from inverse kinematics feasibility, replaces rigid coordinate hand-offs with a continuous feasible region. The architecture is evaluated in the ManiSkill-HAB simulation under teleport-free sequential execution and challenging initialization. The proposed method improves subtask success rate and sample efficiency over the baseline across all six evaluated subtasks, and the advantage compounds along the long-horizon task chain.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo