Key points are not available for this paper at this time.
Despite the widespread adoption of Automatic Speech Recognition (ASR) models in voice-operated products and conversational AI agents, current ASR models perform poorly for people who stutter. One primary cause of the performance disparity is the lack of representative stuttered speech data during the development of ASR models. This work introduces the first stuttered speech dataset in Mandarin Chinese, created by a grassroots community of Chinese-speaking people who stutter to facilitate the development of inclusive and fair speech AI. Collected from 72 speakers with a wide range of stuttering characteristics, this dataset contains speech samples of both spontaneous conversations and voice command dictations from each speaker. Our analysis of the dataset shows the diversity and variability of stuttered utterances captured, highlighting its unique value in authentically representing the stuttering community in AI data. Leveraging this dataset, we benchmark popular ASR models to understand their potential biases against disfluent speech.
Li et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: