Key points are not available for this paper at this time.
What Does the Distribution of Regret Look Like for Bandit Algorithms? In multiarmed bandit theory, the expected regret has long been the predominant performance measure studied in the literature. For the first time, in “The fragility of optimized bandit algorithms,” Fan and Glynn study the tail of the regret distribution. For a given bandit model, it is natural to optimize to achieve the minimum possible expected regret growth rate. However, the authors show that optimizing for expected regret alone can result in severe fragility. Optimized expected regret rates come with the side effect of extremely heavy regret tails (like those of a Cauchy distribution) and thus, significant chance of catastrophically large regret. Moreover, optimized algorithms essentially operate on a knife’s edge such that the slightest degree of model misspecification in algorithm design can result in a much larger expected regret rate. Among other insights, the authors characterize sharp trade-offs between the expected regret rate and the heaviness of the regret tail. They also show how to robustify an algorithm and lighten the regret tail through increased exploration.
Fan et al. (Mon,) studied this question.