Discrete Dynamic Programming

Key Points

Key points are not available for this paper at this time.

Abstract

We consider a system with a finite number S of states s, labeled by the integers 1, 2, , S. Periodically, say once a day, we observe the current state of the system, and then choose an action a from a finite set A of possible actions. As a joint result of the current state s and the chosen action a, two things happen: (1) we receive an immediate income i (s, a) and (2) the system moves to a new state s' with the probability of a particular new state s' given by a function q = q (s' s, a). Finally there is specified a discount factor, 0 < 1, so that the value of unit income n days in the future is ⁿ. Our problem is to choose a policy which maximizes our total expected income. This problem, which is an interesting special case of the general dynamic programming problem, has been solved by Howard in his excellent book 3. The case = 1, also studied by Howard, is substantially more difficult. We shall obtain in this case results slightly beyond those of Howard, though still not complete. Our method, which treats = 1 as a limiting case of < 1, seems rather simpler than Howard's.

Mark Helpful

Bookmark

Relay

Cite This Study

David Blackwell (Fri,) studied this question.

synapsesocial.com/papers/6a22618ef6fb2c59e553c725 https://doi.org/https://doi.org/10.1214/aoms/1177704593

Mark Helpful

Bookmark

Relay