Key points are not available for this paper at this time.
Thompson Sampling has become a prominent algorithmic approach in recent years. This review focuses on the evolution of TS and its variants, showing the innovative aspects of Neural Thompson Sampling (NeuralTS) and Meta-Thompson Sampling (Meta-TS), explaining the aggressive strategy used by Feel-Good Thompson Sampling (FGTS) and the introduction to Safe-LTS for Linear Thompson Sampling (LTS) problem. The survey first systematically review the literature, then examine the theoretical underpinnings, algorithmic frameworks and innovations of those TS variants, in the end provide our insights in future directions. In short, NeuralTS handles high-dimensional reward functions through deep learning integration, Meta-TS takes advantage of meta-learning for adapting to unknown prior distributions, FGTS applies an aggressive exploration strategy to handle pessimistic scenarios. In the end, this paper suggests that future research should emphasis on enhancing generalizability, bridging the gap between theory and practice, and improving adaptability to complex and dynamic environments.
Junqing Yang (Fri,) studied this question.