Heavy-tailed bandits have been extensively studied since the seminal work of Bubeck2012BanditsWH. In particular, heavy-tailed linear bandits, enabling efficient learning with both a large number of arms and heavy-tailed noises, have recently attracted significant attention ShaoYKL18, XueWWZ20, ZhongHYW21, Wang2025heavy, tajdini2025improved. However, prior studies focus almost exclusively on stochastic regimes, with few exceptions limited to the special case of heavy-tailed multi-armed bandits (MABs) Huang0H22, ChengZ024, Chen2024uniINF. In this work, we propose a general framework for adversarial heavy-tailed bandit problems, which performs follow-the-regularized-leader (FTRL) over the loss estimates shifted by a bonus function. Via a delicate setup of the bonus function, we devise the first FTRL-type best-of-both-worlds (BOBW) algorithm for heavy-tailed MABs, which does not require the truncated non-negativity assumption and achieves an O (T^1{}) worst-case regret in the adversarial regime as well as an O (T) gap-dependent regret in the stochastic regime. We then extend our framework to the linear case, proposing the first algorithm for adversarial heavy-tailed linear bandits with finite arm sets. This algorithm achieves an O (d^1{2}T^1{}) regret, matching the best-known worst-case regret bound in stochastic regimes. Moreover, we propose a general data-dependent learning rate, termed heavy-tailed noise aware stability-penalty matching (HT-SPM). We prove that HT-SPM guarantees BOBW regret bounds for general heavy-tailed bandit problems once certain conditions are satisfied. By using HT-SPM and, in particular, a variance-reduced linear loss estimator, we obtain the first BOBW result for heavy-tailed linear bandits.
Zhao et al. (Tue,) studied this question.