Based on the MovieLens 1M real data set, this study systematically compared the actual performance of five typical Multi-Armed Bandit (MAB) algorithms – Explore-Then-Commit (ETC), Upper Confidence Limit (UCB), asymptotically optimal UCB, Thompson Sampling (TS), and Linear Upper Confidence Bound (LinUCB) in the recommender system scenario. The experimental design covers short-term cold start and long-term stable interaction environments, and further tests the robustness of the algorithm under sparse feedback and non-stationary change conditions. ETC achieves the highest initial reward (2050 ± 50) in the cold start phase, but it has great instability. Although UCB has a good convergence guarantee in theory, it shows obvious sensitivity in the sparse feedback environment, resulting in a high cumulative regret. TS shows strong adaptability and robustness in a dynamic environment, and can maintain a low regret value stably. In terms of LinUCB, it outperforms other algorithms in long-term personalized recommendation tasks, achieving the highest cumulative reward (45,000 ± 350) and the lowest cumulative regret (1420 ± 15) in more than 10,000 rounds of experiments. The Wilcoxon signed rank test and Bootstrap resampling methods are used to further verify the statistical significance of the differences in the performance of each algorithm (p < 0.01).
Z. Li (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: