Key points are not available for this paper at this time.
Child speech recognition is still an underdeveloped area of research due to the lack of data (especially on non-English languages) and the specific difficulties of this task. Having explored various architectures for child speech recognition in previous work, in this article we tackle recent self-supervised models. We first compare wav2vec 2.0, HuBERT and WavLM models adapted to phoneme recognition in French child speech, and continue our experiments with the best of them, WavLM base+. We then further adapt it by unfreezing its transformer blocks during fine-tuning on child speech, which greatly improves its performance and makes it significantly outperform our base model, a Transformer+CTC. Finally, we study in detail the behaviour of these two models under the real conditions of our application, and show that WavLM base+ is more robust to various reading tasks and noise levels.
Building similarity graph...
Analyzing shared references across papers
Loading...
Université Toulouse III - Paul Sabatier
Institut Polytechnique de Bordeaux
Institut de Recherche en Informatique de Toulouse
Add This Paper to Your Research Feed
Any time a new paper drops it will be there.
Medin et al. (Sun,) studied this question.