What question did this study set out to answer?

This research aims to improve diarization accuracy in clinical conversations affected by disfluencies and communication variability.

May 14, 2026

Enhancing diarization correction in clinical conversations via synthetic disfluency-augmented training

Key Points

This research aims to improve diarization accuracy in clinical conversations affected by disfluencies and communication variability.
Developed a diarization correction approach using a large language model (LLM) fine-tuned on synthetic medical dialogues.
Augmented training data with controlled simulations of disfluencies and ASR errors.
Evaluated the model on PriMock57 and DementiaBank datasets featuring clinical speech.
Achieved lower diarization word error rates (DWER) compared to Google’s DiarizeLM across both datasets.
Notable improvements were observed in segments with disfluencies and role-switching.
Demonstrated that disfluency-augmented training is effective for adapting ASR systems to healthcare settings.

Abstract

Improving diarization accuracy in clinical conversations remains challenging due to atypical communication patterns, speech disfluencies, and the scarcity of annotated training data. We present a large language model (LLM)-based diarization correction approach fine-tuned on synthetic medical dialogues augmented with controlled injection of simulated disfluencies and ASR errors into the transcripts. This targeted augmentation strategy enables the model to better handle conversational variability common in clinical contexts. We evaluate our approach using two real-world clinical datasets: the PriMock57 corpus, which features realistic multi-speaker clinical conversations with interruptions and hesitations, and the Pitt Corpus from DementiaBank, which includes speech from individuals with dementia. Compared to Google’s pre-trained DiarizeLM model trained on the Fisher dataset, our fine-tuned model achieves consistently lower diarization word error rates (DWER) across both corpora, with notable improvements in disfluent and role-switching segments. These results highlight the value of disfluency-augmented synthetic training for enhancing downstream diarization correction in clinical speech. Our method offers a scalable and privacy-preserving pathway for adapting general-purpose ASR and diarization systems to sensitive healthcare settings, where conventional models often fail due to mismatched training conditions and speaker variability.

Bookmark

Cite This Study

Kathiresan et al. (Wed,) studied this question.

synapsesocial.com/papers/6a0567e9a550a87e60a201e8 https://doi.org/https://doi.org/10.1121/10.0040616

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark