September 1, 2024

Convolution-Augmented Parameter-Efficient Fine-Tuning for Speech Recognition

KKKwangyoun KimNetApp (United States)SSSuwon ShonNetApp (United States)YHYi‐Te HsuNational Institutes of Health

Key Points

Key points are not available for this paper at this time.

Abstract

Parameter-efficient fine-tuning (PEFT) methods, which train only a part of a model, yield efficient and effective models. Bottleneck approaches, such as adapters and low-rank adaptation (LoRA), have been found to be beneficial in numerous studies and are widely utilized. In this work, we propose and investigate an enhanced PEFT method that adds convolution to linear projection-based bottleneck approaches. We experiment with HuBERT, a representative speech model pre-trained with self-supervised learning, and fine-tune it for the automatic speech recognition (ASR) task to examine how the proposed PEFT method impacts training and inference. We demonstrate consistent performance improvements with a minimal increase in parameters and computational complexity.

KI fragen

Bookmark

View Full Paper