September 1, 2024Open Access

Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning

Key Points

Key points are not available for this paper at this time.

Abstract

Child speech recognition is still an underdeveloped area of research due to the lack of data (especially on non-English languages) and the specific difficulties of this task. Having explored various architectures for child speech recognition in previous work, in this article we tackle recent self-supervised models. We first compare wav2vec 2.0, HuBERT and WavLM models adapted to phoneme recognition in French child speech, and continue our experiments with the best of them, WavLM base+. We then further adapt it by unfreezing its transformer blocks during fine-tuning on child speech, which greatly improves its performance and makes it significantly outperform our base model, a Transformer+CTC. Finally, we study in detail the behaviour of these two models under the real conditions of our application, and show that WavLM base+ is more robust to various reading tasks and noise levels.

Read Full Paperexternally

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Institutions

Université Toulouse III - Paul Sabatier

Institut Polytechnique de Bordeaux

Institut de Recherche en Informatique de Toulouse

References and Citations

Add This Paper to Your Research Feed

Any time a new paper drops it will be there.