December 1, 2025

Optimizing Deep Neural Networks for EEG-Based Speech Recognition: A Multimodal Approach to Assistive Communication

Key Points

NeuroSpeech improves speech recognition accuracy in individuals with speech disorders, achieving near-perfect scores under clean conditions.
In evaluations, feature extraction methods significantly impacted performance, with F1 scores of 0.986 for Spanish commands and 0.837 for English phonemes.
Analysis of model complexity revealed NeuroSpeech’s lightweight framework, enabling efficient deployment with low latency.
The findings highlight NeuroSpeech's potential role in assistive technologies, advancing communication for those with speech impairments.

Abstract

Speech recognition for individuals with impairments remains a significant challenge due to atypical speech patterns thatconfound traditional acoustic-only models. This study introduces NeuroSpeech, a novel multimodal framework that integrateselectroencephalography (EEG) with acoustic features to improve recognition accuracy, robustness, and efficiency. A large-scale random search identified optimal EEG encoder configurations and feature extraction parameters, with window size and overlap (p < 0. 001) emerging as critical factors. Explainable AI (XAI) methods, specifically SHAP, provided insights into model decision-making, supporting interpretability and clinical translation. Evaluations were conducted on two publicly available datasets: Spanish commands and vowels (UNLP-CONICET) and English phonemes and words (KaraOne). Under clean conditions, NeuroSpeech achieved near-perfect accuracy (F1 = 0. 986 on Spanish; 0. 837 on English), while in noisy conditions (SNR = 0. 5) it maintained strong performance (F1 = 0. 92 and 0. 70), demonstrating EEG's role as a noise-robust complementary signal. In contrast, Whisper, a state-of-the-art ASR model, showed severe degradation under noise (e. g. , F1 dropping from 0. 81 to 0. 46). Finally, complexity analysis showed that NeuroSpeech is lightweight (1-30M parameters) with inference latency of 10-18ms/sample (RTF < 1 on CPU and GPU), enabling near-real-time deployment. These results demonstrate NeuroSpeech's significant potential to leverage neural information to augment speech that is compromised, offering a promising advancement for assistive technologies and improved communication for individuals with speech disorders.

KI fragen

Bookmark