Aphasia is an everyday communication and speech disorder that impairs the ability of an individual to express through writing and speech. This paper explores the potential of using automatic aphasia speech evaluation models like the Whisper model to evaluate aphasia and potentially other speech impairments. Though effective, traditional methods for aphasia assessment are time-consuming and require specialized clinical expertise. To address these challenges, the study fine-tunes the Whisper using the AphasiaBank dataset to create a more efficient and accessible evaluation tool. The first trial of finetuning focused on the phonemic transcript generation part of the Whisper model and achieved a low accuracy of 56.89%. Minor token prediction errors and word omissions were the major reasons the prediction accuracy was so low. The second trial focused on the model’s prediction structure, included prompt and correction tokens, and showed improved accuracy by 70.76%. This indicates that contextual information and correctness tokens can significantly enhance the model’s performance. Further research and training of this model should be done on the entire AphasiaBank dataset because only a sample was available for this paper. The results of this paper show that there is a potential for AI models such as the Whisper model to be alternative tools for Aphasia testing and evaluation. This would remedy the scarcity of SLPs and make assessment more accessible to all individuals struggling with communication disorders by deploying an app or tool.
Lee et al. (Sat,) studied this question.