What question did this study set out to answer?

To develop an efficient automatic pronunciation scoring model that leverages synthetic data and active learning.

March 16, 2026Open Access

An automatic English pronunciation scoring model using GAN-enhanced synthetic data and active learning

Puntos clave

To develop an efficient automatic pronunciation scoring model that leverages synthetic data and active learning.
Used generative adversarial network for synthetic speech sample generation.
Implemented a hybrid active learning strategy focusing on uncertainty and diversity.
Evaluated the approach using the L2-ARCTIC dataset.
Achieved a Pearson correlation coefficient of 0.843, surpassing baseline performance.
Reduced root mean square error by 8.0%.
Reduced mean absolute error by 9.8%.

Resumen

To address the critical challenges of limited labelled data and high annotation costs in automatic pronunciation assessment, this study proposes a novel framework that integrates generative adversarial network-based synthetic data augmentation with active learning.A conditional generative adversarial network is employed to generate synthetic speech samples with controlled phonemic and articulatory features, while a hybrid active learning strategy combining uncertainty and diversity criteria is designed to select informative samples for expert annotation.Evaluation on the L2-ARCTIC dataset demonstrates that the proposed approach achieves a Pearson correlation coefficient of 0.843, outperforming the best baseline (0.801) by 5.2%.It also reduces root mean square error and mean absolute error by 8.0% and 9.8%, respectively.The results highlight the synergistic effect of generative and selective data strategies, offering an effective solution for low-resource pronunciation scoring.

Me gusta

Guardar

Ver artículo completo