What question did this study set out to answer?

This review aims to explore the impact of large language models on automatic speech recognition in healthcare.

May 15, 2026Open Access

Automatic Speech Recognition in Healthcare in the Post-LLM Era: A Scoping Review

Key Points

This review aims to explore the impact of large language models on automatic speech recognition in healthcare.
Scoping review following PRISMA-ScR guidelines.
Search of peer-reviewed, open-access studies published from January 2022 to December 2025.
Nineteen studies were included from an initial screening of 384 records.
Administrative documentation was the primary application (42.1%).
Whisper was the most commonly used ASR technology (52.6%), often combined with LLMs like GPT or LLaMA.
Documentation time reductions ranged from 30% to 90%, with noted gaps in evaluation and privacy concerns.

Abstract

Context: Automatic Speech Recognition (ASR) in healthcare is undergoing a significant shift driven by the integration of Large Language Models (LLMs). While traditional ASR focused on transcription fidelity, LLM-based systems extend this capability to intelligently reason, summarize, and structure clinical data. This scoping review maps the emerging landscape of LLM-based ASR in healthcare, examining its applications, technical foundations, evaluation practices, and reported challenges. Methods: Following PRISMA-ScR guidelines, we searched different databases for peer-reviewed, open-access studies published between January 2022 and December 2025 to ensure reproducibility and accessibility. Results: Nineteen studies met the inclusion criteria from 384 screened records. Administrative documentation was the most common application (42.1%), followed by diagnosis, therapy, and doctor–patient communication. Whisper dominated ASR (52.6%), typically paired with GPT-family or LLaMA-family LLMs in frozen configurations steered through prompting. LLMs served as the primary component in 68.4% of studies. ASR evaluation within the reviewed studies predominantly relied on word error rate, while LLM evaluation remains fragmented with no standard metric. Studies reported documentation time reductions of 30–90%, though privacy reporting was inconsistent, equity concerns were rarely tested systematically, and only five studies provided replication packages. Conclusions: LLM-based ASR shows potential for reducing documentation burden and supporting clinical workflows, but gaps in evaluation standardization, equity testing, and reproducibility must be addressed before safe clinical deployment.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper