Gittins Procedures for Bandits with Delayed Responses

Key Points

Key points are not available for this paper at this time.

Abstract

SUMMARY This paper introduces the multi-armed delayed response bandit with geometric discounting. The existence of dynamic allocation indices is shown when the discount factor is less than 1/2 or when the information bank size is zero. For the multi-armed delayed response bandit, the arm indicated by the dynamic allocation procedure or Gittins procedure is optimal when all information bank sizes are zero. A computational method for calculating indices is presented. The idea is to approximate the optimal strategy using a class of strategies whose worths are easy to calculate.

Mark Helpful

Bookmark

Relay

Mark Helpful

Bookmark

Relay

Gittins Procedures for Bandits with Delayed Responses

Key Points

Abstract

Cite This Study