Automatic Speech Recognition (ASR) systems often fail to accommodate the diverse speech patterns of children with speech disorders, leading to inaccuracies that undermine their usability in critical diagnostic, therapeutic, and educational applications. This challenge arises due to biases in existing models, which are primarily trained on adult and non-disordered speech, limiting their generalization capabilities and fairness. To address these limitations, we propose a robust technological framework centered on domain-agnostic feature extraction and adversarial training. The feature extractor is designed to learn universal audio representations that transcend domain-specific biases, enabling accurate processing of diverse speech inputs. Adversarial debiasing serves as a key mechanism, optimizing the system to minimize label prediction errors while actively discouraging reliance on domain-dependent features. To further enhance performance, divergence-aware data augmentation generates enriched training datasets, ensuring the model effectively handles variations in speech patterns. Additionally, advanced strategies such as synaptic intelligence and experience replay ensure the retention of critical learned knowledge during iterative model updates. This innovative approach holds the potential to transform ASR systems into equitable and reliable tools, empowering educators, therapists, and caregivers to better support children with speech disorders while advancing the inclusivity of speech recognition technologies.
Shrivastava et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: