Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems

Key Points

Key points are not available for this paper at this time.

Abstract

This paper studies the Bayesian regret of a variant of the Thompson-Sampling algorithm for bandit problems. It builds upon the information-theoretic framework of Russo and Van Roy, 2015 and, more specifically, on the rate-distortion analysis from Dong and Van Roy, 2020, where they proved a bound with regret rate of O (dT (T) ) for the d-dimensional linear bandit setting. We focus on bandit problems with a metric action space and, using a chaining argument, we establish new bounds that depend on the metric entropy of the action space for a variant of Thompson-Sampling. Under suitable continuity assumption of the rewards, our bound offers a tight rate of O (dT) for d-dimensional linear bandit problems.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper

Cite This Study

Gouverneur et al. (Tue,) studied this question.

synapsesocial.com/papers/68e75b28b6db6435876d2636 https://doi.org/https://doi.org/10.48550/arxiv.2403.03361

Mark Helpful

Bookmark

Relay

View Full Paper