In low-resource automatic speech recognition (ASR) scenarios, parameter-efficient fine-tuning (PEFT) has become a crucial approach for adapting large pre-trained speech models. Although low-rank adaptation (LoRA) offers clear advantages in efficiency, stability, and deployment friendliness, its performance remains constrained because random initialization fails to capture the time–frequency structural characteristics of speech signals. To address this limitation, this work proposes a structured initialization mechanism that integrates LoRA with the discrete wavelet transform (DWT). By combining wavelet-based initialization, a multi-scale fusion mechanism, and a residual strategy, the proposed method constructs a low-rank adaptation subspace that better aligns with the local time–frequency properties of speech signals. Discrete Wavelet Transform-Based LoRA Initialization (DWTLoRA) enables LoRA modules to incorporate prior modeling of speech dynamics at the start of fine-tuning, substantially reducing the search space of ineffective directions during early training and improving convergence speed, training stability, and recognition accuracy under low-resource conditions. Experimental results on Sichuan dialect speech recognition based on the Whisper architecture demonstrate that the proposed DWTLoRA initialization outperforms standard LoRA and several PEFT baseline methods in terms of character error rate (CER) and training efficiency, confirming the critical role of signal-structure-aware initialization in low-resource ASR.
Lan et al. (Thu,) studied this question.