Key points are not available for this paper at this time.
The high variability in acoustic, pronunciation, and linguistic characteristics of children's speech makes of children's automatic speech recognition (ASR) a complex task. Training a dedicated ASR model from scratch for children remains challenging, mainly due to the limited availability of children's data. To tackle this limitation, a common strategy involves fine-tuning a pre-trained ASR model. However, this approach faces challenges due to the diversity of speakers and data scarcity, especially when dealing with large ASR models like the Conformer. In this study, we explore an alternative approach known as Adapter transfer. Adapter transfer requires training fewer parameters and can be more effective in adapting large ASR models for children's speech. In this paper, we assess various Adapter configurations in the literature and introduce a novel configuration called Two Serial Adapter (TSA). The experimental results indicate that Adapter transfer consistently outperforms traditional fine-tuning across various configurations for the Conformer model.
Building similarity graph...
Analyzing shared references across papers
Loading...
Thomas Rolland
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento
Alberto Abad
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento
Building similarity graph...
Analyzing shared references across papers
Loading...
Rolland et al. (Mon,) studied this question.
synapsesocial.com/papers/68e7376bb6db6435876b1131 — DOI: https://doi.org/10.1109/icassp48485.2024.10447091