What type of study is this?

This is a Quantitative Study study.

September 12, 2025Open Access

Prompt Self-Correction for SAM2 Zero-Shot Video Object Segmentation

JLJin LeeChonnam National University JBJihun BaeWaseda University DVDang Thanh VuAmerican International School in Egypt

Key Points

Proposed method significantly improves temporal coherence, boosting mean IoU on multiple datasets.
Achieved improvements of 0.1 on DAVIS, 0.13 on LVOS v2 train split, and 0.05 on LVOS v2 validation.
Method employs particle filter for re-inference to correct bounding boxes and detect errors.
Novel self-correction approach offers a parameter-free solution to enhance SAM2's robustness.

Abstract

Foundation models, exemplified by the Segment Anything Model (SAM), have revolutionized object segmentation with their impressive zero-shot capabilities. The recent SAM2 extended these abilities to the video domain, utilizing an object pointer and memory attention to maintain temporal segment consistency. However, a critical limitation of SAM2 is its vulnerability to error accumulation, where an initial incorrect mask can propagate through subsequent frames, leading to tracking failure. To address this, we propose a novel method that actively monitors the temporal segment consistency of masks by evaluating the distance of object pointers across frames. When a potential error is detected via a sharp increase in distance, our method triggers a particle filter based re-inference module. This framework models object’s motion to predict a corrected bounding box, effectively guiding the model to recover the valid mask and preventing error propagation. Extensive zero-shot evaluations on DAVIS, LVOS v2, YouTube-VOS and qualitative results show that the proposed, parameter-free procedure consistently improves temporal coherence, raising mean IoU by 0.1 on DAVIS, by 0.13 on the LVOS v2 train split and 0.05 on the LVOS v2 validation split, and by 0.02 on YouTube-VOS, thereby offering a simple and effective route to more robust video object segmentation with SAM2.

KI fragen

Bookmark

View Full Paper