Key points are not available for this paper at this time.
This paper considers the multiarmed bandit problem and presents a new proof of the optimality of the Gittins index policy. The proof is intuitive and does not require an interchange argument. The insight it affords is used to give a streamlined summary of previous research and to prove a new result: The optimal value function is a submodular set function of the available projects.
Building similarity graph...
Analyzing shared references across papers
Loading...
Richard Weber
University of Cambridge
The Annals of Applied Probability
Building similarity graph...
Analyzing shared references across papers
Loading...
Richard Weber (Sun,) studied this question.
synapsesocial.com/papers/6a1a5c4c382248a4518558cb — DOI: https://doi.org/10.1214/aoap/1177005588