What question did this study set out to answer?

This research aims to develop a robust framework for suppressing unintended vocal resonance in Mandarin-speaking aphasic patients.

March 2, 2026Open Access

Adaptive acoustic feedback control in aphasia Therapy: A Graph-Based learning approach for Unintended resonance suppression in Mandarin (Chinese)-Speaking aphasic patients

Key Points

This research aims to develop a robust framework for suppressing unintended vocal resonance in Mandarin-speaking aphasic patients.
Developed the graph-based adaptive acoustic feedback control (GA-AFC) system.
Integrated graph neural networks with reinforcement learning for real-time feedback adaptation.
Constructed an articulation-resonance graph based on various acoustic features.
Conducted evaluations on three benchmark Mandarin datasets to test system effectiveness.
Achieved a 17.2% word error reduction relative to Wav2Vec 2.0.
Saw a 30.1% improvement in word error rates compared to DeepSpeech.
Improved tone classification accuracy by 14.8% on the HKUST corpus.
Logged a spectral deviation improvement of 28.6% compared to baseline systems.
Achieved a mean opinion score of 4.4 in subjective listening tests.

Abstract

Aphasia therapy for Mandarin-speaking patients presents distinct challenges due to the language’s tonal characteristics and the presence of unforeseen vocal resonance, which reduces intelligibility and distorts tone contours. Current automatic speech feedback systems face challenges managing such distortions, especially in real-time and customized clinical contexts. This paper develops a novel framework, named graph-based adaptive acoustic feedback control (GA-AFC), that integrates graph neural networks (GNNs) with reinforcement learning (RL) to model and suppress articulation-resonance mismatches in aphasic speech in a dynamic manner. Unlike black-box automatic speech recognition (ASR) and traditional autoregressive models, GA-AFC constructs an articulation-resonance graph based on acoustic features such as harmonicity, pitch, energy, and Mel-frequency cepstral coefficients (MFCCs). The system utilizes GNN encoders to capture phoneme-tonal transitions and employs an RL policy to adapt acoustic feedback in real-time. Experimental evaluations on three benchmark Mandarin datasets, i.e., Common Voice (Mandarin), AISHELL-1, and HKUST, demonstrate that GA-AFC achieves substantial improvements in both fluency enhancement and recognition accuracy. In the context of aphasic speech, the model achieves an average word error reduction (WER) of 17.2% relative to Wav2Vec 2.0 and 30.1% relative to DeepSpeech, alongside a 14.8% improvement in tone classification accuracy on the HKUST corpus. Regarding resonance suppression, GA-AFC logs a spectral deviation of baseline systems by 28.6%, achieving a MOS score of 4.4 (±0.3) in subjective listening tests, which surpasses all comparative models. Moreover, the system demonstrates rapid convergence, with adaptation times of less than 20 s and feedback latencies of under 140 ms , making it suitable for real-time clinical use. The findings indicate that GA-AFC provides a responsive, adaptable, and clinically applicable framework for customizable speech feedback in Mandarin aphasia therapy, proposing a novel approach to tone- and resonance-sensitive neural interventions in speech rehabilitation.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Zhang et al. (Sat,) studied this question.

synapsesocial.com/papers/69a52dabf1e85e5c73bf0a8e https://doi.org/https://doi.org/10.1016/j.eij.2026.100908

Bookmark

View Full Paper