Key points are not available for this paper at this time.
AI-driven severity assessment techniques for dysarthric disorders show promise in aiding speech-language pathologists with diagnostics and therapeutic follow-ups for patients. Existing solutions generally focus on the average intelligibility and hoarseness of the individual speaker's speech (i.e., speaker-level classification). This potentially ignores the slight variations in pronunciation attributed to the speaker's dysarthric disorders, e.g., /t/ and /d/. To address this issue, we rethink the inherent differences in the dysarthria speech, and propose a non-intrusive severity assessment approach called DysarNet. Specifically, we first design a prosodic emphasis module based on frame-level speech features to highlight the fine-grained temporal changes including pronunciation content, rhythm, and timing. Second, we design a multi-scale aggregation strategy to collect statistical cues on articulatory information at different scales, i.e., frame-level and utterance-level. By doing so, multi-scale prosody and articulatory cues are directly assist the prediction network for assessing dysarthria severity from multiple views, and naturally achieve speaker-independent generalization ability. Experimental results on VCC 2018 and TORGO datasets show that our DysarNet excels in assessing dysarthria severity.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ganjun Liu
Xiaohui Hou
Ge Meng
National University of Singapore
Tianjin University
National University Health System
Building similarity graph...
Analyzing shared references across papers
Loading...
Liu et al. (Sun,) studied this question.
www.synapsesocial.com/papers/68e6a879b6db64358762b06b — DOI: https://doi.org/10.1145/3589335.3651449