Speech recognition for individuals with impairments remains a significant challenge due to atypical speech patterns thatconfound traditional acoustic-only models. This study introduces NeuroSpeech, a novel multimodal framework that integrateselectroencephalography (EEG) with acoustic features to improve recognition accuracy, robustness, and efficiency. A large-scale random search identified optimal EEG encoder configurations and feature extraction parameters, with window size and overlap (p < 0. 001) emerging as critical factors. Explainable AI (XAI) methods, specifically SHAP, provided insights into model decision-making, supporting interpretability and clinical translation. Evaluations were conducted on two publicly available datasets: Spanish commands and vowels (UNLP-CONICET) and English phonemes and words (KaraOne). Under clean conditions, NeuroSpeech achieved near-perfect accuracy (F1 = 0. 986 on Spanish; 0. 837 on English), while in noisy conditions (SNR = 0. 5) it maintained strong performance (F1 = 0. 92 and 0. 70), demonstrating EEG's role as a noise-robust complementary signal. In contrast, Whisper, a state-of-the-art ASR model, showed severe degradation under noise (e. g. , F1 dropping from 0. 81 to 0. 46). Finally, complexity analysis showed that NeuroSpeech is lightweight (1-30M parameters) with inference latency of 10-18ms/sample (RTF < 1 on CPU and GPU), enabling near-real-time deployment. These results demonstrate NeuroSpeech's significant potential to leverage neural information to augment speech that is compromised, offering a promising advancement for assistive technologies and improved communication for individuals with speech disorders.
Das et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: