What type of study is this?

August 24, 2025Open Access

Advancing Arabic ASR for Disordered Speech: Fine-Tuning Wav2Vec2 on Egyptian Dysarthric Speech

Key Points

Fine-tuning improved word error rate from 0.8516 to 0.3736 and character error rate from 0.5756 to 0.3478, showcasing significant gains in accuracy.
The dataset comprised about 1,300 utterances from an Egyptian Arabic speaker with speech impairments, enabling personalized ASR adaptation.
Analysis included data preprocessing and evaluation through word error rate and character error rate metrics to assess model effectiveness.
Personalized fine-tuning suggests tailored ASR models can improve accessibility for Arabic speakers with speech disorders, emphasizing the need for diverse datasets.

Abstract

Despite significant advances in Automatic Speech Recognition (ASR), its application to low-resource languages such as Arabic—especially for speakers with speech disorders—remains underdeveloped. This study presents a novel approach to Arabic ASR for disordered speech by fine-tuning a Wav2Vec2 model on a personalized dataset comprising approximately 1,300 utterances from an Egyptian Arabic speaker with speech impairments. Building on the comparative foundation set by Alsohby (2025), which evaluated four state-of-the-art ASR models across general, dysarthric, and accented speech, we extend the analysis through specialized model adaptation. Our methodology encompasses data preprocessing, fine-tuning, and evaluation using Word Error Rate (WER) and Character Error Rate (CER). Results indicate a substantial performance gain, reducing WER from 0.8516 to 0.3736 and CER from 0.5756 to 0.3478. These findings demonstrate the effectiveness of personalized fine-tuning and underscore the critical need for diverse, domain-specific datasets to improve ASR accessibility for Arabic speakers with speech impairments.

Ask AI

Helpful

Bookmark

View Full Paper