March 5, 2024Open Access

LC-Tsalis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits

Key Points

Key points are not available for this paper at this time.

Abstract

This study considers the linear contextual bandit problem with independent and identically distributed (i. i. d. ) contexts. In this problem, existing studies have proposed Best-of-Both-Worlds (BoBW) algorithms whose regrets satisfy O (² (T) ) for the number of rounds T in a stochastic regime with a suboptimality gap lower-bounded by a positive constant, while satisfying O (T) in an adversarial regime. However, the dependency on T has room for improvement, and the suboptimality-gap assumption can be relaxed. For this issue, this study proposes an algorithm whose regret satisfies O ( (T) ) in the setting when the suboptimality gap is lower-bounded. Furthermore, we introduce a margin condition, a milder assumption on the suboptimality gap. That condition characterizes the problem difficulty linked to the suboptimality gap using a parameter (0, ]. We then show that the algorithm's regret satisfies O (\ (T) \^1+{2+}T^1{2+}). Here, = corresponds to the case in the existing studies where a lower bound exists in the suboptimality gap, and our regret satisfies O ( (T) ) in that case. Our proposed algorithm is based on the Follow-The-Regularized-Leader with the Tsallis entropy and referred to as the -Linear-Contextual (LC) -Tsallis-INF.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper

Cite This Study

Kato et al. (Tue,) studied this question.

synapsesocial.com/papers/68e75b28b6db6435876d2565 https://doi.org/https://doi.org/10.48550/arxiv.2403.03219

Mark Helpful

Bookmark

Relay

View Full Paper