What question did this study set out to answer?

This research aims to enhance understanding of hallucinations in large visual and language models.

April 29, 2026Open Access

Understanding Hallucinations in Large Visual and Language Models

Key Points

This research aims to enhance understanding of hallucinations in large visual and language models.
Introduced a unified, multi-level framework for characterizing hallucinations in image and text.
Analyzed models' lifecycle to trace the root causes of hallucinations.
Examined the interleaved task-modality to identify driving mechanisms.
Revealed hallucinations as predictable consequences of biases and underlying distributions.
Demonstrated that a unified understanding can improve solutions to hallucinations.

Abstract

The rapid deployment of large language and vision models in real-world applications has intensified the need to address hallucinations—instances where models generate incorrect or incoherent outputs. These failures can spread misinformation and degrade workflows, causing financial and operational harm. Despite extensive research efforts, our understanding of hallucinations remains limited and fragmented. Without clear understanding, solutions risk addressing disparate symptoms rather than root causes, which undermines their effectiveness and generalisability during deployment. To address this, we first introduce a unified, multi-level framework to characterise both image and text hallucinations across broad applications, helping reduce conceptual fragmentation. Then, we trace their root causes to identifiable mechanisms within a model’s lifecycle in a task-modality interleaved manner, fostering a deeper and more holistic understanding. Our investigations reveal hallucinations as predictable consequences of underlying distributions and biases. By enhancing our understanding of hallucinations, this survey lays the groundwork for more effective solutions to hallucinations in generative AI systems.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper