Key points are not available for this paper at this time.
We study here an a problem called “the multi-armed bandit problem with known trend”, where an agent knows the shape of the reward function of each arm but not its distribution. This problem is motivated by different real world tasks, where when an arm is sampled by the model the received reward change according to a known trend. By adapting the standard multi-armed bandit algorithms, we propose to study the regret upper bounds of three algorithms: the two first one assumes a stochastic model; and the last one is based on a Bayesian approach.
Djallel Bouneffouf (Fri,) studied this question.