March 18, 2024Open Access

Improved Children’s Automatic Speech Recognition Combining Adapters and Synthetic Data Augmentation

Key Points

Key points are not available for this paper at this time.

Abstract

Children's automatic speech recognition (ASR) poses a significant challenge due to the high variability nature of children's speech. The limited availability of training datasets hampers the effective modelling of this variability, which can be partially addressed using a text-to-speech (TTS) system for data augmentation. However, generated data may contain imperfections, potentially impacting performance. In this work, we use Adapters to handle the domain mismatch when fine-tuning with TTS data. This involves a two-step training process: training adapter layers with a frozen pre-trained model using synthetic data, then fine-tuning both adapters and the entire model with a mix of synthetic and real data, where only synthetic data passes through the adapters. Experimental results demonstrate up to 6% relative reduction in WER compared to the straightforward use of synthetic data, indicating the effectiveness of adapter-based architectures in learning from imperfect synthetic data.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Thomas Rolland

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento

Alberto Abad

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento

Actions

Institutions

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento

Improved Children’s Automatic Speech Recognition Combining Adapters and Synthetic Data Augmentation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study