What question did this study set out to answer?

The aim is to analyze spatial hallucination phenomena in radiology visual question answering.

May 6, 2026Open Access

Spatially-Grounded Hallucination Detection in Multimodal Medical AI: A Study of Vision-Language Models on Radiology Visual Question Answering

Key Points

The aim is to analyze spatial hallucination phenomena in radiology visual question answering.
Introduced a four-tier taxonomy of medical spatial hallucinations
Analyzed 14 vision-language models across six radiology visual question answering benchmarks
Evaluated spatial grounding fidelity using established metrics and a new composite metric
Spatial hallucination rates in radiology tasks ranged from 23.7% to 41.2%
These rates significantly exceed the 12.4%–18.9% observed on natural image benchmarks
Proposed mitigation strategies for enhancing spatial fidelity in multimodal medical AI

Abstract

Vision-Language Models (VLMs) are increasingly being adopted in radiology for tasks ranging from automated image interpretation to report generation and visual question answering (VQA). Yet these models have a well-documented tendency to produce clinically unfaithful outputs, commonly referred to as hallucinations, which raise serious patient safety concerns in diagnostic settings. Although hallucination detection has attracted growing interest in the broader computer vision community, the specific problem of spatially-grounded hallucination within medical imaging has received comparatively little attention. This paper addresses that gap. We present what is, to our knowledge, the first systematic analysis focused specifically on spatial hallucination phenomena in radiology VQA. We introduce a four-tier taxonomy of medical spatial hallucinations, organized into Existence Fabrication, Anatomical Mislocalization, Spatial Relationship Distortion, and Volumetric Reasoning Failure, each grounded in clinical radiology practice. We analyze 14 VLMs across six radiology VQA benchmarks and evaluate their spatial grounding fidelity using established metrics alongside a new composite metric we call the Spatially-Grounded Hallucination Index (SGHI). Our findings indicate that spatial hallucination rates on radiology tasks range from 23.7% to 41.2%, substantially exceeding the 12.4%–18.9% observed on natural image benchmarks. We also review mitigation strategies and lay out a research roadmap toward clinically trustworthy, spatially-faithful multimodal medical AI.

Spatially-Grounded Hallucination Detection in Multimodal Medical AI: A Study of Vision-Language Models on Radiology Visual Question Answering

Key Points

Abstract

Cite This Study