What type of study is this?

This is a Quantitative Study study.

September 30, 2025Open Access

Zero-Shot Referring Expression Comprehension via Visual-Language True/False Verification

Key Points

Our method surpasses baseline performance on multiple datasets, showing significant improvements over traditional grounding methods.
Using zero-shot visual-language verification, we achieve outperforming results compared to GroundingDINO methods trained specifically for referring expression comprehension.
The workflow designed for zero-shot verification minimizes cross-box interference, thus improving interpretation of visual-language queries.
Independent control studies confirm that our verification method significantly outperforms selection-based prompting across various evaluation metrics.

Abstract

Referring Expression Comprehension (REC) is usually addressed with task-trained grounding models. We show that a zero-shot workflow, without any REC-specific training, can achieve competitive or superior performance. Our approach reformulates REC as box-wise visual-language verification: given proposals from a COCO-clean generic detector (YOLO-World), a general-purpose VLM independently answers True/False queries for each region. This simple procedure reduces cross-box interference, supports abstention and multiple matches, and requires no fine-tuning. On RefCOCO, RefCOCO+, and RefCOCOg, our method not only surpasses a zero-shot GroundingDINO baseline but also exceeds reported results for GroundingDINO trained on REC and GroundingDINO+CRG. Controlled studies with identical proposals confirm that verification significantly outperforms selection-based prompting, and results hold with open VLMs. Overall, we show that workflow design, rather than task-specific pretraining, drives strong zero-shot REC performance.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Jeffrey Liu

Runjie Hu

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Zero-Shot Referring Expression Comprehension via Visual-Language True/False Verification

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider