Key points are not available for this paper at this time.
We derive a one-period look-ahead policy for finite- and infinite-horizon online optimal learning problems with Gaussian rewards. Our approach is able to handle the case where our prior beliefs about the rewards are correlated, which is not handled by traditional multiarmed bandit methods. Experiments show that our KG policy performs competitively against the best-known approximation to the optimal policy in the classic bandit problem, and it outperforms many learning policies in the correlated case.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ilya O. Ryzhov
Princeton University
Warren B. Powell
Princeton University
Peter I. Frazier
Cornell University
Operations Research
Cornell University
Princeton University
University of Maryland, College Park
Building similarity graph...
Analyzing shared references across papers
Loading...
Ryzhov et al. (Wed,) studied this question.
synapsesocial.com/papers/6a1771f81723722a886ea3e4 — DOI: https://doi.org/10.1287/opre.1110.0999
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: