What type of study is this?

This is a Quantitative Study study.

October 2, 2025Open Access

State-of-the-Art Dysarthric Speech Recognition with MetaICL for on-the-fly Personalization

DADhruuv AgarwalGoogle (United States)HZHarry ZhangUniversity of Wisconsin–Madison YYYu YangXi'an Jiaotong University

Key Points

The approach achieves a 13.9% word error rate on Euphonia, outperforming traditional models with 17.5%.
On SAP Test 1, the model's 5.3% word error rate demonstrates significant improvement over personalized adapters at 8%.
Example curation shows that 5 curated examples can match performance of 19 random examples, enhancing efficiency.
Data ablations reveal insights into the efficiency of the proposed hybrid meta-training method.

Abstract

Personalizing Automatic Speech Recognition (ASR) for dysarthric speech is crucial but challenging due to training and storing of individual user adapters. We propose a hybrid meta-training method for a single model, excelling in zero-shot and few-shot on-the-fly personalization via in-context learning (ICL). Measuring Word Error Rate (WER) on state-of-the-art subsets, the model achieves 13.9% WER on Euphonia which surpasses speaker-independent baselines (17.5% WER) and rivals user-specific personalized models. On SAP Test 1, its 5.3% WER significantly bests the 8% from even personalized adapters. We also demonstrate the importance of example curation, where an oracle text-similarity method shows 5 curated examples can achieve performance similar to 19 randomly selected ones, highlighting a key area for future efficiency gains. Finally, we conduct data ablations to measure the data efficiency of this approach. This work presents a practical, scalable, and personalized solution.

Ask AI

Helpful

Bookmark

View Full Paper