Key points are not available for this paper at this time.
We consider a bandit problem consisting of a sequence of n choices from an infinite number of Bernoulli arms, with n. The objective is to minimize the long-run failure rate. The Bernoulli parameters are independent observations from a distribution F. We first assume F to be the uniform distribution on (0, 1) and consider various extensions. In the uniform case we show that the best lower bound for the expected failure proportion is between 2/n and 2/n and we exhibit classes of strategies that achieve the latter.
Berry et al. (Wed,) studied this question.