Large language models (LLMs) are demonstrating transformative potential in medical informatics, assisting with tasks ranging from diagnostic reasoning to patient communication. However, their propensity to generate confident yet unfounded outputs—termed hallucinations—poses significant risks to patient safety and clinical accountability. This paper presents a systematic literature review of research from 2023 to 2025, analyzing the risks, benchmarks, detection paradigms, and mitigation strategies associated with medical hallucinations. The paper synthesizes our findings into a novel evaluative framework, CR²(Capability × Reliability × Cost × Clinical Risk), designed to guide risk -aware adoption. Our analysis confirms that hallucination is a structural property of autoregressive text generation under uncertainty. Consequently, we argue that hybrid control—integrating retrieval grounding, verification mechanisms, calibrated generation, and human oversight—constitutes the most credible path toward trustworthy deployment. The review concludes by identifying critical open challenges, including the need for harm-weighted evaluation, multilingual generalisability, and operational governance mechanisms.
Building similarity graph...
Analyzing shared references across papers
Loading...
Shengxuan Huang
Building similarity graph...
Analyzing shared references across papers
Loading...
Shengxuan Huang (Mon,) studied this question.
www.synapsesocial.com/papers/69d9e62078050d08c1b7661a — DOI: https://doi.org/10.1051/itmconf/20268403005/pdf