What type of study is this?

This is a Experimental Study study.

October 8, 2025Open Access

Enhancing Low-Resource Dialectal ASR in Indonesian Using Speech-Transformer Models and Data Augmentation

Key Points

The implementation of data augmentation techniques significantly improved accuracy for low-resource dialect speech recognition.
Average accuracy improvements were recorded at 57.6%, 57.9%, and 59.3% for character error rate, word error rate, and sentence error rate, respectively.
This analysis applied various augmentation methods, including pitch shifting and noise addition, within a speech-transformer framework.
The findings point to the importance of data augmentation in enhancing the performance of model generalization and mitigating overfitting.

Abstract

One of the main challenges faced by researchers in speech recognition is the limitation of data, especially for low-resource languages. A common strategy to improve a model's performance is to expand the data space through data augmentation techniques. Data augmentation has proven effective in increasing the amount of training data and reducing the mismatch between training and testing data. Furthermore, data augmentation is essential for improving the performance of deep neural networks by mitigating overfitting and enhancing the models' generalization capabilities. This study compares the impact of several standard augmentation techniques applied to low-resource dialect speech (time stretching, pitch shifting, noise addition, and gain) on speech recognition performance using a Speech-Transformer architecture. The dataset used consists of Indonesian dialectal speech. The results indicate that the average accuracy improvement in recognition was 57.6%, 57.9%, and 59.3% for Character Error Rate (CER), Word Error Rate (WER), and Sentence Error Rate (SER), respectively, compared to speech recognition without any data augmentation.

Enhancing Low-Resource Dialectal ASR in Indonesian Using Speech-Transformer Models and Data Augmentation

Key Points

Abstract

Cite This Study