What question did this study set out to answer?

The aim is to develop a Text-to-Speech model for the low-resource Penang Hokkien dialect while addressing phonological sparsity.

June 7, 2026Open Access

Phonetic Completeness Over Prosodic Diversity: Syllable-Level Synthetic Corpus Construction for Low-Resource Penang Hokkien Speech Synthesis

Key Points

The aim is to develop a Text-to-Speech model for the low-resource Penang Hokkien dialect while addressing phonological sparsity.
Developed a two-stage fine-tuning approach for TTS model refinement.
Augmented a 45-minute speech corpus with a 2-hour synthetic corpus to include 2,000 unique syllable-tone combinations.
Utilized technical optimizations such as cross-fading and numerical tone markers to enhance model quality.
Achieved a Mean Opinion Score of 3.92 for the synthesized speech.
Improved syllable-tone coverage significantly enhanced intelligibility and tonal accuracy.
Implemented optimizations successfully reduced boundary artifacts and token sparsity.

Abstract

This study presents the first Text-to-Speech (TTS) model for Penang Hokkien, a low-resource tonal dialect at risk of extinction. To address phonological sparsity in the collected speech corpus, we propose a two-stage fine-tuning approach that emphasizes comprehensive phonetic coverage through syllable-level synthetic augmentation while subsequently refining prosodic naturalness using real speech recordings. By supplementing a limited 45-minute real speech corpus with a 2-hour syllable-level concatenative synthetic corpus, the full dialectal inventory of approximately 2,000 unique syllable-tone combinations was encompassed. Experimental results suggest that improving syllable-tone coverage contributes substantially to intelligibility and tonal accuracy in this low-resource tonal setting. Technical optimizations, including a 600-ms cross-fading technique to mitigate boundary artifacts and numerical tone markers to reduce token sparsity, further improved model stability and synthesis quality. The final model achieved a Mean Opinion Score (MOS) of 3.92.

AI에게 질문

Bookmark

View Full Paper