Key points are not available for this paper at this time.
SUMMARY This paper introduces the multi-armed delayed response bandit with geometric discounting. The existence of dynamic allocation indices is shown when the discount factor is less than 1/2 or when the information bank size is zero. For the multi-armed delayed response bandit, the arm indicated by the dynamic allocation procedure or Gittins procedure is optimal when all information bank sizes are zero. A computational method for calculating indices is presented. The idea is to approximate the optimal strategy using a class of strategies whose worths are easy to calculate.
Stephen G. Eick (Thu,) studied this question.