Long-horizon mobile manipulation requires a robot to execute a sequence of heterogeneous subtasks such as navigation, picking, and articulated-object manipulation in indoor environments. Standard reinforcement learning suffers from reward sparsity and inefficient exploration in this setting, and hierarchical methods often fail at the hand-off between consecutive subtasks when the terminal state of one subtask is kinematically infeasible for the next. We propose a motion planning-augmented hierarchical reinforcement learning architecture to resolve the fundamental trade-offs between sample efficiency and hand-off reliability in long-horizon mobile manipulation. The mission is decomposed into subtasks via a Semi-Markov Decision Process; within each subtask, a collision-free reference trajectory generated by RRT* in the full joint configuration space is embedded into the reward as a per-step shaping signal; and a region-goal mechanism, defined analytically from inverse kinematics feasibility, replaces rigid coordinate hand-offs with a continuous feasible region. The architecture is evaluated in the ManiSkill-HAB simulation under teleport-free sequential execution and challenging initialization. The proposed method improves subtask success rate and sample efficiency over the baseline across all six evaluated subtasks, and the advantage compounds along the long-horizon task chain.
Kim et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: