What type of study is this?

This is a Experimental Study study.

October 11, 2025Open Access

Heavy-tailed Linear Bandits: Adversarial Robustness, Best-of-both-worlds, and Beyond

Key Points

The proposed algorithm achieves an O(T^1/ε) worst-case regret in adversarial conditions.
Utilizing a new bonus function setup, it provides an O(log T) gap-dependent regret in stochastic scenarios.
The framework successfully extends to linear cases, achieving O(d^1/2 T^1/ε) regret.
The heavy-tailed noise aware stability-penalty matching guarantees success under specified conditions.

Abstract

Heavy-tailed bandits have been extensively studied since the seminal work of Bubeck2012BanditsWH. In particular, heavy-tailed linear bandits, enabling efficient learning with both a large number of arms and heavy-tailed noises, have recently attracted significant attention ShaoYKL18, XueWWZ20, ZhongHYW21, Wang2025heavy, tajdini2025improved. However, prior studies focus almost exclusively on stochastic regimes, with few exceptions limited to the special case of heavy-tailed multi-armed bandits (MABs) Huang0H22, ChengZ024, Chen2024uniINF. In this work, we propose a general framework for adversarial heavy-tailed bandit problems, which performs follow-the-regularized-leader (FTRL) over the loss estimates shifted by a bonus function. Via a delicate setup of the bonus function, we devise the first FTRL-type best-of-both-worlds (BOBW) algorithm for heavy-tailed MABs, which does not require the truncated non-negativity assumption and achieves an O (T^1{}) worst-case regret in the adversarial regime as well as an O (T) gap-dependent regret in the stochastic regime. We then extend our framework to the linear case, proposing the first algorithm for adversarial heavy-tailed linear bandits with finite arm sets. This algorithm achieves an O (d^1{2}T^1{}) regret, matching the best-known worst-case regret bound in stochastic regimes. Moreover, we propose a general data-dependent learning rate, termed heavy-tailed noise aware stability-penalty matching (HT-SPM). We prove that HT-SPM guarantees BOBW regret bounds for general heavy-tailed bandit problems once certain conditions are satisfied. By using HT-SPM and, in particular, a variance-reduced linear loss estimator, we obtain the first BOBW result for heavy-tailed linear bandits.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper