What type of study is this?

This is a Quantitative Study study.

October 20, 2025Open Access

ViQAgent: Zero-Shot Video Question Answering via Agent with Open-Vocabulary Grounding Validation

Key Points

The method shows enhanced performance on key benchmarks like NExT-QA and iVQA.
It integrates a Chain-of-Thought framework with grounding reasoning to improve object alignment.
The system utilizes YOLO-World for better object tracking over time within diverse video domains.
The framework significantly increases output reliability and supports cross-checking of grounding timeframes.

Abstract

Recent advancements in Video Question Answering (VideoQA) have introduced LLM-based agents, modular frameworks, and procedural solutions, yielding promising results. These systems use dynamic agents and memory-based mechanisms to break down complex tasks and refine answers. However, significant improvements remain in tracking objects for grounding over time and decision-making based on reasoning to better align object references with language model outputs, as newer models get better at both tasks. This work presents an LLM-brained agent for zero-shot Video Question Answering (VideoQA) that combines a Chain-of-Thought framework with grounding reasoning alongside YOLO-World to enhance object tracking and alignment. This approach establishes a new state-of-the-art in VideoQA and Video Understanding, showing enhanced performance on NExT-QA, iVQA, and ActivityNet-QA benchmarks. Our framework also enables cross-checking of grounding timeframes, improving accuracy and providing valuable support for verification and increased output reliability across multiple video domains. The code is available at https://github.com/t-montes/viqagent.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Tony Montes

Fernando Lozano

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

ViQAgent: Zero-Shot Video Question Answering via Agent with Open-Vocabulary Grounding Validation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study