What question did this study set out to answer?

The aim is to develop a stereo matching network that generalizes well across real-world environments via self-prompt guidance.

May 20, 2026Open Access

Sparse Self-Prompt-Guided Stereo Matching for Real-World Generalization

Puntos clave

The aim is to develop a stereo matching network that generalizes well across real-world environments via self-prompt guidance.
Developed a sparse self-prompt-guided network (SSPGNet) for stereo matching.
Collected diverse indoor and outdoor stereo pairs using a ZED 2 camera for real-world assessment.
Utilized a sparse-to-dense disparity prediction mechanism to enhance stereo correspondence reasoning.
SSPGNet achieved bad-pixel error rates of 3.6% on KITTI 2012, 4.4% on KITTI 2015, 7.6% on Middlebury, and 2.1% on ETH3D.
Ranked first on three out of four public benchmarks for stereo matching performance.
Enhanced performance demonstrates potential for direct real-world applications.

Resumen

Stereo matching has witnessed rapid advances on curated benchmarks, yet deploying models in unconstrained real-world environments remains a fundamental challenge. This paper presents a sparse self-prompt-guided network (SSPGNet) for stereo matching with strong generalization across diverse environments. Our core innovation lies in a sparse self-prompt guidance mechanism: (1) a sparse disparity map, used as a prompt, is self-estimated from visual foundation model features via cost aggregation; (2) the sparse disparity is progressively refined into dense disparity maps through cross-attention-based stereo feature interaction, enabling sparse-to-dense disparity prediction. Additionally, we collected a diverse set of indoor and outdoor stereo pairs by using a ZED 2 camera to assess the real-world performance of our model. Extensive experiments demonstrate that the proposed sparse-to-dense prompt mechanism not only preserves the semantic awareness of visual foundation models but also enhances stereo correspondence reasoning, achieving strong performance on public benchmarks and our in-the-wild dataset. Specifically, under the cross-domain (zero-shot) protocol, the proposed SSPGNet achieves bad-pixel error rates of 3.6% on KITTI 2012 (>3 px), 4.4% on KITTI 2015 (>3 px), 7.6% on Middlebury (>2 px), and 2.1% on ETH3D (>1 px), ranking first on three of the four public benchmarks. These results highlight the potential of SSPGNet for direct deployment in real-world stereo perception systems. The code is publicly available at GitHub.

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo