Video object segmentation (VOS) aims to distinguish and track target objects in a video. Despite the excellent performance achieved by off-the-shelf VOS models, part of the existing VOS benchmarks mainly focuses on short-term videos, where objects remain visible most of the time. However, these benchmarks may not fully capture challenges encountered in practical applications, and the absence of long-term datasets restricts further investigation of VOS in realistic scenarios. Thus, we propose a novel benchmark named LVOS, comprising 720 videos with 296,401 frames and 407,945 high-quality annotations. Videos in LVOS last 1.14 minutes on average. Each video includes various attributes, especially challenges encountered in the wild, such as long-term reappearing and cross-temporal similar objects. Compared to previous benchmarks, our LVOS better reflects VOS models' performance in real scenarios. Based on LVOS, we evaluate 15 existing VOS models under 3 different settings and conduct a comprehensive analysis. On LVOS, these models suffer a large performance drop, highlighting the challenge of achieving precise tracking and segmentation in real-world scenarios. Attribute-based analysis indicates that one of the significant factors contributing to accuracy decline is the increased video length, interacting with complex challenges such as long-term reappearance, cross-temporal confusion, and occlusion, which emphasize LVOS's crucial role. We hope our LVOS can advance development of VOS in real scenes.
Building similarity graph...
Analyzing shared references across papers
Loading...
Lingyi Hong
City University of Hong Kong
Liu Zhong-ying
China Agricultural University
Wenchao Chen
Fudan University
IEEE Transactions on Pattern Analysis and Machine Intelligence
Fudan University
Building similarity graph...
Analyzing shared references across papers
Loading...
Hong et al. (Wed,) studied this question.
synapsesocial.com/papers/68d461b631b076d99fa607d5 — DOI: https://doi.org/10.1109/tpami.2025.3611020
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: