What does this research mean for the field?

Large language models can effectively assist in interpreting pure-tone audiograms for patients, improving comprehension and emotional support, though they cannot replace physicians' diagnostic capabilities. Novelty: ClaimNovelty.SYNTHESIS. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

Evaluate the effectiveness of large language models in interpreting pure-tone audiograms for improved patient understanding.

March 16, 2026Open Access

A multicenter multifunctional assessment of large language models in pure-tone audiogram interpretation for patients

Key Points

Evaluate the effectiveness of large language models in interpreting pure-tone audiograms for improved patient understanding.
Blinded multicenter evaluation of eight LLMs
Analysis of 140 audiogram reports
Assessment by clinicians and lay reviewers
Diagnostic and interpretive tasks performed by LLMs
DeepSeek-V3 achieved the highest diagnostic accuracy for severity (67%) and type (54%)
R1 was most suited for general readership with a FKGL score of 6.41
Significant perceived benefits in comprehension and emotional support from all models
Gemini 2.0 Flash/Thinking scored higher in user satisfaction
Challenges identified in understanding pathological mechanisms and managing hallucinations

Abstract

Abstract No LLMs (Large Language Models) have yet been evaluated for understanding picture reports. Pure-tone audiograms, the gold standard for hearing loss assessment, are technical and often incomprehensible to patients without specialist interpretation. We conducted a blinded, multicenter evaluation of eight LLMs across diagnostic, interpretive, and recommendation tasks using 140 audiogram reports, assessed by clinicians and lay reviewers. The study revealed that DeepSeek-V3 achieved the highest diagnostic accuracy (severity: 67.00% ; type: 54.00%), R1 proved most suitable for general readership (FKGL: 6.41). The general public perceived significant benefits from all models in comprehension and emotional support, with Gemini 2.0 Flash/Thinking scoring higher. Challenges remain in understanding pathological mechanisms and controlling hallucinations. While current general-purpose LLMs cannot replace the diagnostic capabilities of physicians, they may serve as effective auxiliary tools for translating specialized audiogram data into structured, patient-accessible interpretations, with particular relevance for populations facing limited access to hearing-care services.

Bookmark

View Full Paper

Bookmark

View Full Paper

A multicenter multifunctional assessment of large language models in pure-tone audiogram interpretation for patients

Key Points

Abstract

Cite This Study