Personalizing Automatic Speech Recognition (ASR) for dysarthric speech is crucial but challenging due to training and storing of individual user adapters. We propose a hybrid meta-training method for a single model, excelling in zero-shot and few-shot on-the-fly personalization via in-context learning (ICL). Measuring Word Error Rate (WER) on state-of-the-art subsets, the model achieves 13.9% WER on Euphonia which surpasses speaker-independent baselines (17.5% WER) and rivals user-specific personalized models. On SAP Test 1, its 5.3% WER significantly bests the 8% from even personalized adapters. We also demonstrate the importance of example curation, where an oracle text-similarity method shows 5 curated examples can achieve performance similar to 19 randomly selected ones, highlighting a key area for future efficiency gains. Finally, we conduct data ablations to measure the data efficiency of this approach. This work presents a practical, scalable, and personalized solution.
Building similarity graph...
Analyzing shared references across papers
Loading...
Dhruuv Agarwal
Google (United States)
Harry Zhang
University of Wisconsin–Madison
Yu Yang
Xi'an Jiaotong University
Building similarity graph...
Analyzing shared references across papers
Loading...
Agarwal et al. (Fri,) studied this question.
synapsesocial.com/papers/68de5da283cbc991d0a20a0f — DOI: https://doi.org/10.48550/arxiv.2509.15516