Los puntos clave no están disponibles para este artículo en este momento.
Aiming at the problems of low exploration efficiency, many route turning points and unstable convergence of traditional Q-learning algorithm, a method of joint iterative convergence of exploration factor and learning rate is proposed to improve Q-learning algorithm. The method can better balance exploration and utilization in the training process by dynamically adjusting the exploration factor and learning rate. In each iteration, the exploration strategy is adjusted according to its exploration factor in the current state, which enables the agent to explore the unknown domain in a more targeted way. At the same time, a dynamic learning rate is introduced to adjust the magnitude of the update to the Q-value based on the reliability of the accumulated experience. This joint strategy helps to explore more extensively in the early stages of learning and utilize the learned knowledge more deeply in the later stages. The map is constructed using the grid method, and a comparison of the simulation results shows that the improved algorithm is optimized in terms of both exploration efficiency and path turning points.
Yu et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: