With the rapid development of video streaming services, adaptive bitrate (ABR) algorithms have become a core technology for ensuring optimal viewing experiences. Traditional ABR strategies, predominantly rule-based or reinforcement learning-driven, typically employ uniform quality assessment metrics that overlook users’ subjective preference differences regarding factors such as video quality and stalling. To address this limitation, this paper proposes an adaptive video bitrate selection system that integrates preference modeling with reinforcement learning. By incorporating a preference learning module, the system models and scores user viewing trajectories, using these scores to replace conventional rewards and guide the training of the Proximal Policy Optimization (PPO) algorithm, thereby achieving policy optimization that better aligns with users’ perceived experiences. Simulation results on DASH network bandwidth traces demonstrate that the proposed optimization method improves overall Quality of Experience (QoE) by over 9% compared to other mainstream algorithms.
Building similarity graph...
Analyzing shared references across papers
Loading...
Z.C. Feng
Yazhi Liu
Hao Zhang
Electronics
North China University of Science and Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Feng et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68c1ae7754b1d3bfb60e69cd — DOI: https://doi.org/10.3390/electronics14153103