To address the critical challenges of limited labelled data and high annotation costs in automatic pronunciation assessment, this study proposes a novel framework that integrates generative adversarial network-based synthetic data augmentation with active learning.A conditional generative adversarial network is employed to generate synthetic speech samples with controlled phonemic and articulatory features, while a hybrid active learning strategy combining uncertainty and diversity criteria is designed to select informative samples for expert annotation.Evaluation on the L2-ARCTIC dataset demonstrates that the proposed approach achieves a Pearson correlation coefficient of 0.843, outperforming the best baseline (0.801) by 5.2%.It also reduces root mean square error and mean absolute error by 8.0% and 9.8%, respectively.The results highlight the synergistic effect of generative and selective data strategies, offering an effective solution for low-resource pronunciation scoring.
Long et al. (Thu,) studied this question.