ABSTRACT Dynamic obstacle avoidance remains a key challenge in robotic arm motion planning, as traditional algorithms struggle to balance adaptive decision‐making with precise trajectory generation in unstructured environments. We present a hierarchical motion planning framework that combines proximal policy optimization (PPO) with rapidly exploring random tree star (RRT*), trained using a curriculum learning paradigm. PPO learns global obstacle avoidance strategies through progressively difficult training scenarios, while RRT* refines local trajectories to compensate for PPO's limitations in fine motor control. A multiobjective reward function—incorporating step‐efficiency terms and artificial potential field principles—balances exploration and exploitation through tailored penalties and rewards. In dynamic obstacle scenarios, the proposed method achieves an 87.6% success rate, outperforming standalone PPO and existing hybrid reinforcement learning approaches. This framework offers a practical solution for dynamic obstacle avoidance with broader applicability to high‐dimensional autonomous manipulation tasks.
Ma et al. (Sun,) studied this question.