What question did this study set out to answer?

The central aim is to evaluate and mitigate relationship hallucinations in large vision-language models.

February 6, 2026

Evaluating and Mitigating Relationship Hallucinations in Large Vision-Language Models

Key Points

The central aim is to evaluate and mitigate relationship hallucinations in large vision-language models.
Developed a benchmark called R-Bench for evaluating inter-object relationship hallucinations.
Included image-level and instance-level questions to assess relationship perception.
Analyzed co-occurrences leading to relationship hallucinations and their impact on model performance.
Proposed the Region-Aware Alignment Mitigation (RA2M) technique to improve alignment.
Identified three types of co-occurrences causing relationship hallucinations.
Demonstrated that existing models over-rely on language common sense for spatial reasoning.
Showed that region-level image-text alignment reduces relationship hallucinations and improves model performance.

Abstract

The issue of hallucinations is a prevalent concern in existing Large Vision-Language Models (LVLMs). Previous efforts have primarily focused on investigating object hallucinations, which can be easily alleviated by introducing object detectors. However, these efforts neglect hallucinations in inter-object relationships, essential for visual comprehension. In this work, we introduce R-Bench, a novel benchmark specifically designed to evaluate hallucinations in visual relationships. R-Bench includes both image-level questions to assess the existence of relationships and instance-level questions that probe deeper into local visual comprehension. Our analysis reveals that relationship hallucinations arise from three types of co-occurrences: relationship-relationship, subject-relationship, and relationship-object, exacerbated by the long-tail distribution in visual datasets. Moreover, LVLMs often ignore visual content, over-relying on common sense from language models, particularly in spatial reasoning tasks. We further demonstrate that region-level image-text alignment helps mitigate relationship hallucinations and propose a new baseline, Region-Aware Alignment Mitigation (RA2M), that enhances model attention to relevant regions, improving alignment between generated text and images.

AI से पूछें

Bookmark