Key points are not available for this paper at this time.
In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a time-invariant set of alternatives and receives the payoff associated with th...
SlivkinsAleksandrs (Wed,) studied this question.