June 4, 2012Open Access

Dynamic potential-based reward shaping

Key Points

Key points are not available for this paper at this time.

Abstract

Potential-based reward shaping can significantly improve the time needed to learn an optimal policy and, in multiagent systems, the performance of the final joint-policy. It has been proven to not alter the optimal policy of an agent learning alone or the Nash equilibria of multiple agents learning together. However, a limitation of existing proofs is the assumption that the potential of a state does not change dynamically during the learning. This assumption often is broken, especially if the reward-shaping function is generated automatically. In this paper we prove and demonstrate a method of extending potential-based reward shaping to allow dynamic shaping and maintain the guarantees of policy invariance in the single-agent case and consistent Nash equilibria in the multi-agent case.

Bookmark

View Full Paper

Cite This Study

Devlin et al. (Mon,) studied this question.

synapsesocial.com/papers/6a2227d71b095894fc4ed235 https://doi.org/https://doi.org/10.5555/2343576.2343638

Bookmark

View Full Paper