Key points are not available for this paper at this time.
Five mobile robots in a swarm are trained in this paper using the deep reinforcement learning method to solve a navigation problem with two targets without prior knowledge of the environment. The Proximal Policy Optimization method trains E-puck mobile robots to avoid obstacles while completing tasks in the shortest distance for each robot. A Webots simulator is used to model the environment in three-dimension space. The suggested algorithm works with continuous states derived from eight infrared sensors and continuous action spaces that reflect the velocities of two motors for each robot in the swarm. Then the robot's behavior will be examined in light of two categories of rewards: Spares and Shaping rewards. If the environment's complexity is not decreased, proximal policy optimization with spare rewards will not be able to train every robot in the system how to accomplish its goal. Compared to the spare rewards technique, shaping rewards aid the robot in gaining experience from prior knowledge during the training process. This speeds up learning and aids navigation in more complex environments.
Iskandar et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: