Aphasia therapy for Mandarin-speaking patients presents distinct challenges due to the language’s tonal characteristics and the presence of unforeseen vocal resonance, which reduces intelligibility and distorts tone contours. Current automatic speech feedback systems face challenges managing such distortions, especially in real-time and customized clinical contexts. This paper develops a novel framework, named graph-based adaptive acoustic feedback control (GA-AFC), that integrates graph neural networks (GNNs) with reinforcement learning (RL) to model and suppress articulation-resonance mismatches in aphasic speech in a dynamic manner. Unlike black-box automatic speech recognition (ASR) and traditional autoregressive models, GA-AFC constructs an articulation-resonance graph based on acoustic features such as harmonicity, pitch, energy, and Mel-frequency cepstral coefficients (MFCCs). The system utilizes GNN encoders to capture phoneme-tonal transitions and employs an RL policy to adapt acoustic feedback in real-time. Experimental evaluations on three benchmark Mandarin datasets, i.e., Common Voice (Mandarin), AISHELL-1, and HKUST, demonstrate that GA-AFC achieves substantial improvements in both fluency enhancement and recognition accuracy. In the context of aphasic speech, the model achieves an average word error reduction (WER) of 17.2% relative to Wav2Vec 2.0 and 30.1% relative to DeepSpeech, alongside a 14.8% improvement in tone classification accuracy on the HKUST corpus. Regarding resonance suppression, GA-AFC logs a spectral deviation of baseline systems by 28.6%, achieving a MOS score of 4.4 (±0.3) in subjective listening tests, which surpasses all comparative models. Moreover, the system demonstrates rapid convergence, with adaptation times of less than 20 s and feedback latencies of under 140 ms , making it suitable for real-time clinical use. The findings indicate that GA-AFC provides a responsive, adaptable, and clinically applicable framework for customizable speech feedback in Mandarin aphasia therapy, proposing a novel approach to tone- and resonance-sensitive neural interventions in speech rehabilitation.
Zhang et al. (Sat,) studied this question.