January 1, 2014

Multi-objective reinforcement learning using sets of pareto dominating policies

Key Points

Key points are not available for this paper at this time.

Abstract

Many real-world problems involve the optimization of multiple, possibly conflicting ob-jectives. Multi-objective reinforcement learning (MORL) is a generalization of standard reinforcement learning where the scalar reward signal is extended to multiple feedback signals, in essence, one for each objective. MORL is the process of learning policies that optimize multiple criteria simultaneously. In this paper, we present a novel temporal differ-ence learning algorithm that integrates the Pareto dominance relation into a reinforcement learning approach. This algorithm is a multi-policy algorithm that learns a set of Pareto dominating policies in a single run. We name this algorithm Pareto Q-learning and it is applicable in episodic environments with deterministic as well as stochastic transition func-tions. A crucial aspect of Pareto Q-learning is the updating mechanism that bootstraps sets of Q-vectors. One of our main contributions in this paper is a mechanism that sep-arates the expected immediate reward vector from the set of expected future discounted reward vectors. This decomposition allows us to update the sets and to exploit the learned policies consistently throughout the state space. To balance exploration and exploitation

AI से पूछें

Bookmark

Cite This Study

Moffaert et al. (Wed,) studied this question.

synapsesocial.com/papers/6a0f18eaa00258d2006c8d90 https://doi.org/https://doi.org/10.5555/2627435.2750356

AI से पूछें

Bookmark