To achieve efficient and stable autonomous operation during excavator robotic trimming plane operations, this study proposes an online trajectory planning method based on deep reinforcement learning (RL) using the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. This method involves the construction of a simulation environment to generate training data, where the joint angles of the boom, arm, and bucket of the excavator robotic working device serve as state observation variables, and the angle changes of each joint constitute the action information. The interaction between the simulation environment and the autonomous learning algorithm is facilitated by these state observations, and the policy network is trained using a reward function. Under identical experimental conditions, the proposed algorithm exhibits higher training time compared with other RL algorithms designed for continuous action spaces. Specifically, the training time of the proposed algorithm is reduced by 24.81%, 40.29%, and 34.51% compared with those of the DDPG, traditional TD3, and TRPO algorithms, respectively. In addition, the time of the proposed algorithm compared with the DDPG algorithm, TRPO algorithm, and traditional TD3 algorithm, the time required to complete a given task is reduced by 1.807, 3.703, and 5.011 s, respectively. These results indicate that the proposed optimization algorithm offers improved efficiency and faster convergence than the DDPG, traditional TD3, and TRPO algorithms, ultimately generating an efficient task trajectory. Moreover, the method effectively minimizes the large impacts on each joint, ensuring that the excavator robotic system operates with high efficiency and stability.
Zhang et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: