Stereo matching has witnessed rapid advances on curated benchmarks, yet deploying models in unconstrained real-world environments remains a fundamental challenge. This paper presents a sparse self-prompt-guided network (SSPGNet) for stereo matching with strong generalization across diverse environments. Our core innovation lies in a sparse self-prompt guidance mechanism: (1) a sparse disparity map, used as a prompt, is self-estimated from visual foundation model features via cost aggregation; (2) the sparse disparity is progressively refined into dense disparity maps through cross-attention-based stereo feature interaction, enabling sparse-to-dense disparity prediction. Additionally, we collected a diverse set of indoor and outdoor stereo pairs by using a ZED 2 camera to assess the real-world performance of our model. Extensive experiments demonstrate that the proposed sparse-to-dense prompt mechanism not only preserves the semantic awareness of visual foundation models but also enhances stereo correspondence reasoning, achieving strong performance on public benchmarks and our in-the-wild dataset. Specifically, under the cross-domain (zero-shot) protocol, the proposed SSPGNet achieves bad-pixel error rates of 3.6% on KITTI 2012 (>3 px), 4.4% on KITTI 2015 (>3 px), 7.6% on Middlebury (>2 px), and 2.1% on ETH3D (>1 px), ranking first on three of the four public benchmarks. These results highlight the potential of SSPGNet for direct deployment in real-world stereo perception systems. The code is publicly available at GitHub.
Li et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: