Key points are not available for this paper at this time.
In this study, we investigate the performance of multi-armed bandit algorithms in environments characterized by heavytailed and non-stationary reward distributions, a setting that deviates from the conventional risk-neutral and sub- Gaussian assumptions.
Pan et al. (Thu,) studied this question.