What question did this study set out to answer?

This research aims to develop a speech data collection system to enhance Korean speech recognition and diagnosis for atypical speakers.

May 14, 2026

Processing Korean atypical speech data for speech recognition and diagnosis

Key Points

This research aims to develop a speech data collection system to enhance Korean speech recognition and diagnosis for atypical speakers.
Designed a mobile recording app integrated with a web-based processing backend to gather speech samples.
Targeted diverse populations including children, elderly adults, and individuals with clinical conditions.
Implemented structured post-processing for segmentation, denoising, transcription, and annotation of the collected data.
The resulting corpus effectively captures the full range of Korean phonemes and speech variation.
Demonstrated improved ASR accuracy for underrepresented groups through robust model training.
Provided valuable resources for clinical assessment of developmental and age-related speech disorders.

Abstract

We present a novel speech data collection system tailored for the Korean language, combining a mobile recording app with a web-based processing backend. The platform enables large-scale crowd-sourcing of speech samples and is designed to address the underrepresentation of Korean in existing AI corpora. Unlike conventional datasets, our system actively targets atypical and diverse speakers—children, elderly adults, and individuals from clinical populations—whose voices are often absent in publicly available resources. The collected corpus covers the full range of Korean phonemes and reflects dialectal and stylistic variation, including spontaneous and informal speech, which is crucial given Korean’s complex phonology and sociolinguistic diversity. Critically, all collected data undergo structured post-processing: recordings are segmented, denoised, transcribed, and annotated through the web interface to support both detailed acoustic analysis and the training of robust ASR models. This pipeline ensures data quality while enabling efficient monitoring, correction, and labeling. Beyond improving ASR accuracy for underrepresented speaker groups, the corpus also provides valuable resources for clinical speech assessment and diagnosis, particularly for developmental and age-related speech conditions. Our work underscores the importance of inclusive, language-specific data collection and processing frameworks for advancing AI speech technologies in non-English and culturally unique linguistic contexts.

Perguntar à IA

Bookmark

Perguntar à IA

Bookmark

Processing Korean atypical speech data for speech recognition and diagnosis

Key Points

Abstract

Cite This Study