Improving diarization accuracy in clinical conversations remains challenging due to atypical communication patterns, speech disfluencies, and the scarcity of annotated training data. We present a large language model (LLM)-based diarization correction approach fine-tuned on synthetic medical dialogues augmented with controlled injection of simulated disfluencies and ASR errors into the transcripts. This targeted augmentation strategy enables the model to better handle conversational variability common in clinical contexts. We evaluate our approach using two real-world clinical datasets: the PriMock57 corpus, which features realistic multi-speaker clinical conversations with interruptions and hesitations, and the Pitt Corpus from DementiaBank, which includes speech from individuals with dementia. Compared to Google’s pre-trained DiarizeLM model trained on the Fisher dataset, our fine-tuned model achieves consistently lower diarization word error rates (DWER) across both corpora, with notable improvements in disfluent and role-switching segments. These results highlight the value of disfluency-augmented synthetic training for enhancing downstream diarization correction in clinical speech. Our method offers a scalable and privacy-preserving pathway for adapting general-purpose ASR and diarization systems to sensitive healthcare settings, where conventional models often fail due to mismatched training conditions and speaker variability.
Kathiresan et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: