What question did this study set out to answer?

The primary aim is to enhance the detection of camouflaged objects in video sequences by utilizing motion reasoning techniques.

May 4, 2026

MRCNet: Motion Reasoning Chain for Cross Modal Video Camouflaged Object Detection.

Puntos clave

The primary aim is to enhance the detection of camouflaged objects in video sequences by utilizing motion reasoning techniques.
Developed Motion Reasoning Chain Network (MRCNet) for video camouflaged object detection.
Introduced generative sampling using multimodal large language models to create a motion reasoning chain.
Implemented hierarchical de-biased motion prototype learning and cross-modal prompt learning for improved visual representation.
MRCNet outperformed traditional methods on general metrics, achieving state-of-the-art detection rates.
Demonstrated improved spatiotemporal consistency metrics in object identification across three datasets.

Resumen

Video camouflaged object detection (VCOD) aims to identify objects that seamlessly blend into their surroundings in video sequences. Traditional methods merely rely on visual cues to capture inter-frame motion that reveals camouflaged objects. However, the high similarity between camouflaged objects and their environments often renders pure reliance on visual cues unreliable. Additionally, random motions including camera shaking and abrupt scene transitions also inevitably bring noise into the identification process. To overcome these challenges, we propose a Motion Reasoning Chain Network (MRCNet), a novel cross-modal VCOD framework that emulates the human thought process when observing camouflaged objects, i.e., motion reasoning. Specifically, we introduce a generative sampling strategy grounded in multimodal large language models (MLLMs) to bridge the implicit knowledge space of MLLMs and the explicit representation space regarding the attributes of camouflaged objects, thereby enabling the effective establishment of the motion reasoning chain tailored for VCOD. This process provides semantic guidance for visual comprehension of camouflaged objects through motion and concept attribute reasoning. To improve the identification capability of camouflaged objects, we develop motion representation learning driven by the motion reasoning chain. It introduces hierarchical de-biased motion prototype learning to mitigate hallucinations of MLLMs, boosting the motion perception. To learn precise prompts for the visual foundation model, cross-modal prompt learning further incorporates the de-biased concept prototype into visual representations to enhance the visual comprehension of camouflaged objects. Extensive experiments across three datasets demonstrate that MRCNet achieves state-of-the-art results on both general metrics and spatiotemporal consistency metrics.

Me gusta

Guardar

Cite This Study

Hui et al. (Fri,) studied this question.

synapsesocial.com/papers/69f837233ed186a739981405 https://doi.org/https://doi.org/10.1109/tpami.2026.3689767

Me gusta

Guardar