Web3 technologies enable novel forms of real-time digital human streaming media by supporting both high-fidelity transmission and interactive user engagement. However, Real-Time Streaming Interactive Digital Humans (RTSIDHs) remain vulnerable to network instability, resulting in buffering, latency, visual degradation, and audio–video desynchronization that substantially impair user Quality of Experience (QoE). To effectively perceive these distortions, we present RDHQA, the first large-scale RTSIDH Quality Assessment dataset. RDHQA comprises 134 representative interaction scenarios with eight digital human avatars as high-quality references, along with 1,340 distorted samples generated by simulating five common streaming degradations. Based on extensive subjective evaluations, we further propose SAV-PF, an audio–visual quality assessment method built on the human foundation model Sapiens and informed by cognitive principles such as the primacy effect and forgetting curve. Experimental results demonstrate that SAV-PF achieves superior performance over existing objective QoE assessment approaches, providing a more accurate prediction of user experience. This work is open-sourced at https://github.com/zyj-2000/RDHQA under the CC BY-NC 4.0 licence.
Zhou et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: