Despite recent advances in automatic speech recognition (ASR), performance remains limited for Korean-accented English due to the limited availability of accent-specific speech data, including pronunciation and prosodic variations. To address this limitation, we propose a synthetic data generation framework for improving Whisper-based ASR performance. Synthetic speech is generated by converting English text into Hangul-based phonetic transcriptions using an intermediate IPA representation to reflect the phonological characteristics of Korean-accented English. The ASR model is fine-tuned using Low-Rank Adaptation with a mixture of synthetic and authentic speech data. Experimental results demonstrate relative reductions of up to 16.40% in the character error rate, 14.93% in the word error rate, and 14.81% in the phoneme error rate compared to the pretrained baseline.
Jang et al. (Thu,) studied this question.