Video streaming has emerged as a dominant mode of entertainment and information dissemination. However, traditional approaches to video bitrate adaptation often fall short in non-stationary network environments, leading to suboptimal user experiences. To address these challenges, this work presents an innovative method leveraging Proximal Policy Optimization (PPO), a Reinforcement Learning (RL) algorithm, to optimize the Quality of Experience (QoE) in video streaming scenarios. A Network Digital Twin (NDT) environment, which allows to emulate production environments, and the end, facilitate high-fidelity training and evaluation while completely isolating live network operations from experimental risks. The PPO algorithm learns to make bitrate decisions based on a reward function that incorporates key QoE metrics, including blockiness and block loss, achieving an impressive 73.02% enhancement in overall performance. Experimental results demonstrate a significant reduction in block loss (66.87%) and an improvement in blockiness (6.40%), while also increasing the streaming bitrate (17.18%) from training to production deployment.
Rio et al. (Wed,) studied this question.