What question did this study set out to answer?

The aim is to enhance automatic speech recognition for Korean-accented English by generating synthetic speech data.

March 28, 2026Open Access

Enhancing Korean-Accented English ASR with Transliteration-Based Data Synthesis

Key Points

The aim is to enhance automatic speech recognition for Korean-accented English by generating synthetic speech data.
Developed a synthetic data generation framework using Hangul-based phonetic transcriptions.
Used IPA representation to capture phonological characteristics of Korean-accented English.
Fine-tuned a Whisper-based ASR model with a combination of synthetic and authentic speech data.
Achieved up to 16.40% reduction in character error rates compared to the baseline.
Reduced word error rates by 14.93% from baseline results.
Showed a 14.81% decrease in phoneme error rates compared to previous models.

Abstract

Despite recent advances in automatic speech recognition (ASR), performance remains limited for Korean-accented English due to the limited availability of accent-specific speech data, including pronunciation and prosodic variations. To address this limitation, we propose a synthetic data generation framework for improving Whisper-based ASR performance. Synthetic speech is generated by converting English text into Hangul-based phonetic transcriptions using an intermediate IPA representation to reflect the phonological characteristics of Korean-accented English. The ASR model is fine-tuned using Low-Rank Adaptation with a mixture of synthetic and authentic speech data. Experimental results demonstrate relative reductions of up to 16.40% in the character error rate, 14.93% in the word error rate, and 14.81% in the phoneme error rate compared to the pretrained baseline.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Jang et al. (Thu,) studied this question.

synapsesocial.com/papers/69c772818bbfbc51511e30d4 https://doi.org/https://doi.org/10.3390/electronics15071380

Bookmark

View Full Paper