Key points are not available for this paper at this time.
This paper studies the sequential decision model known as the two-armed-bandit with finite memory. It was introduced by Robbins 8 in 1956 and studied further by Isbell 5 in 1959. In this paper, a set of rules is defined which are uniformly better than those given in 5 and 8. A much larger class of rules is then defined, one member of which is conjectured to be a uniformly best rule.
Smith et al. (Fri,) studied this question.