What type of study is this?

September 10, 2025Open Access

Comparative Analysis of Classical and Linucb Bandits in Recommender Systems Based on Cumulative Regret and Reward

Key Points

LinUCB achieves the highest cumulative reward of 45,000 ± 350, outperforming other algorithms.
ETC has the highest initial reward at 2050 ± 50 in cold start conditions but is unstable.
Thompson Sampling demonstrates strong adaptability and maintains low regret in dynamic environments.
UCB suffers from high cumulative regret of sensitivity in sparse feedback situations.

Abstract

Based on the MovieLens 1M real data set, this study systematically compared the actual performance of five typical Multi-Armed Bandit (MAB) algorithms – Explore-Then-Commit (ETC), Upper Confidence Limit (UCB), asymptotically optimal UCB, Thompson Sampling (TS), and Linear Upper Confidence Bound (LinUCB) in the recommender system scenario. The experimental design covers short-term cold start and long-term stable interaction environments, and further tests the robustness of the algorithm under sparse feedback and non-stationary change conditions. ETC achieves the highest initial reward (2050 ± 50) in the cold start phase, but it has great instability. Although UCB has a good convergence guarantee in theory, it shows obvious sensitivity in the sparse feedback environment, resulting in a high cumulative regret. TS shows strong adaptability and robustness in a dynamic environment, and can maintain a low regret value stably. In terms of LinUCB, it outperforms other algorithms in long-term personalized recommendation tasks, achieving the highest cumulative reward (45,000 ± 350) and the lowest cumulative regret (1420 ± 15) in more than 10,000 rounds of experiments. The Wilcoxon signed rank test and Bootstrap resampling methods are used to further verify the statistical significance of the differences in the performance of each algorithm (p < 0.01).

Comparative Analysis of Classical and Linucb Bandits in Recommender Systems Based on Cumulative Regret and Reward

Key Points

Abstract

Cite This Study

Also Consider

Also Consider