Key points are not available for this paper at this time.
This study considers the linear contextual bandit problem with independent and identically distributed (i. i. d. ) contexts. In this problem, existing studies have proposed Best-of-Both-Worlds (BoBW) algorithms whose regrets satisfy O (² (T) ) for the number of rounds T in a stochastic regime with a suboptimality gap lower-bounded by a positive constant, while satisfying O (T) in an adversarial regime. However, the dependency on T has room for improvement, and the suboptimality-gap assumption can be relaxed. For this issue, this study proposes an algorithm whose regret satisfies O ( (T) ) in the setting when the suboptimality gap is lower-bounded. Furthermore, we introduce a margin condition, a milder assumption on the suboptimality gap. That condition characterizes the problem difficulty linked to the suboptimality gap using a parameter (0, ]. We then show that the algorithm's regret satisfies O (\ (T) \^1+{2+}T^1{2+}). Here, = corresponds to the case in the existing studies where a lower bound exists in the suboptimality gap, and our regret satisfies O ( (T) ) in that case. Our proposed algorithm is based on the Follow-The-Regularized-Leader with the Tsallis entropy and referred to as the -Linear-Contextual (LC) -Tsallis-INF.
Kato et al. (Tue,) studied this question.