March 21, 2024Open Access

Comparative analysis of Sliding Window UCB and Discount Factor UCB in non-stationary environments: A Multi-Armed Bandit approach

Key Points

Key points are not available for this paper at this time.

Abstract

The Multi-Armed Bandit (MAB) problem is a well-studied topic within stationary environments, where the reward distributions remain consistent over time. Nevertheless, many real-world applications often fall within non-stationary contexts, where the rewards from each arm can evolve. In light of this, our research focuses on examining and contrasting the effectiveness of two leading algorithms tailored for these shifting environments: the Sliding Window Upper Confidence Bound (SW-UCB) and the Discount Factor UCB (DF-UCB). By harnessing both simulated and real-world datasets, our evaluation encompasses adaptability, computational efficiency, and the potential for regret minimization. Our findings reveal that the SW-UCB is adept at swiftly adjusting to sudden shifts, whereas the DF-UCB emerges as the more resource-efficient option amidst gradual transitions. Notably, when pitted against conventional UCB algorithms within non-stationary contexts, both contenders exhibit substantial advancements. Such insights bear significant relevance to fields like online advertising, healthcare, and finance, where the capacity to nimbly adapt to dynamic environments is paramount.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Haochen Liu (Thu,) studied this question.

synapsesocial.com/papers/68e7309eb6db6435876aa624 https://doi.org/https://doi.org/10.54254/2755-2721/49/20241077

Bookmark

View Full Paper