Finite-time analysis of the multi-armed bandit problem with known trend

Key Points

Key points are not available for this paper at this time.

Abstract

We study here an a problem called “the multi-armed bandit problem with known trend”, where an agent knows the shape of the reward function of each arm but not its distribution. This problem is motivated by different real world tasks, where when an arm is sampled by the model the received reward change according to a known trend. By adapting the standard multi-armed bandit algorithms, we propose to study the regret upper bounds of three algorithms: the two first one assumes a stochastic model; and the last one is based on a Bayesian approach.

Mark Helpful

Bookmark

Relay

Cite This Study

Djallel Bouneffouf (Fri,) studied this question.

synapsesocial.com/papers/6a09371da2ba569cbf15f283 https://doi.org/https://doi.org/10.1109/cec.2016.7744106

Mark Helpful

Bookmark

Relay