ABSTRACT In the field of multimedia, conventional video streaming remains the dominant playback format. Current research predominantly focuses on optimizing adaptive bitrate (ABR) algorithms to enhance quality of experience (QoE), delivering improved viewing experiences to users. However, the majority of existing approaches consider only single‐user scenarios, whereas practical environments necessitate addressing the challenge of multiple users sharing bottleneck link bandwidth. These methods fail to holistically consider the multiple factors influencing QoE and provide insufficient consideration for QoE fairness. While bandwidth allocation fairness is achieved, ensuring fairness in user QoE remains challenging. Furthermore, these ABR algorithms rely solely on bitrate for adaptation, resulting in limited control dimensions and an inability to provide fine‐grained ABR decisions. Additionally, certain methods require the deployment of additional control equipment to obtain global network states for achieving fairness, which increases deployment complexity in existing networks. To address the issue of QoE fairness in multi‐user video streaming, this paper models it as a Markov decision process (MDP) for multi‐agent cooperative fair allocation of limited bottleneck link resources, and proposes a multi‐agent reinforcement learning‐based ABR algorithm. The algorithm incorporates several improved Multi‐Agent Deep Reinforcement Learning (MADRL) techniques to collaboratively select optimal chunk bitrate and download delay for different users, allowing for fine‐grained control of the chunk download strategies. Furthermore, this paper designs and implements a video streaming distribution framework that operates without relying on additional network‐assisted devices. This framework can efficiently acquire global client state, overcome performance disparities among clients, and achieve centralized and scalable ABR decision‐making. Experimental results demonstrate that compared to existing methods, the proposed approach achieves a significant rightward shift in the CDF curve of average user QoE at the 50th percentile. Furthermore, it adeptly selects appropriate bitrate strategies for different types of devices. Consequently, the total transmitted data volume is reduced by 30.4% to 54.3%, leading to optimized bandwidth occupancy while ensuring user QoE fairness.
Liu et al. (Sun,) studied this question.