February 23, 2024

Evaluation of English Pronunciation Interaction Quality Based on Deep Learning

Key Points

Key points are not available for this paper at this time.

Abstract

Applications for language learning frequently use automatic pronunciation assessment models. An important task that greatly relies on the automatic speech recognition (ASR) is the automatic fluency assessment of spontaneous speech in the absence of reference material. Using combined prosodic, completeness, and fluency scores, this research implements an innovative way to get around such limitations. The dynamic temporal warping (DTW) matching of the pitch contours of a weighted average of the context tokens present in the audio file, which is rich in mispronunciation phonemes, is used to perform this issue. The speechocean762 dataset has been used to validate the better outcomes. This implemented model achieved better results values of 0.980 of Corre, and 0.072 of MSE, 0.753 of PCC, 0.6534 of rounded PCC, and 0.1122 of rounded MSE. This implemented model was compared with existing methods such as multimodal automatic speech fluency assessment model and end-to-end (E2E-R) methods.

Bookmark

Cite This Study

Bo Xu (Fri,) studied this question.

synapsesocial.com/papers/68e77f57b6db6435876f31d5 https://doi.org/https://doi.org/10.1109/icicacs60521.2024.10498355

Bookmark