Stereo image super-resolution aims to reconstruct high-resolution images from lowresolution stereo pairs by leveraging complementary information between binocular views, which is essential for a wide range of computer vision applications. To address the limitations in cross-view feature matching of existing methods, particularly in weaktextured regions, we propose the Adaptive Multi-Scale Cross-Attention Stereo Image Super-Resolution Network (AMCASSR). The network comprises two principal modules: the Adaptive Multi-Scale Cross-Attention (AMSCA) module, which enhances reconstruction performance in weak-textured regions by expanding the receptive field and adaptively fusing multi-scale features; and the Multi-Scale Cross-Attention Feature Block (MSCFB), which facilitates the integration of intra-view feature learning and cross-view interaction. Additionally, the network optimizes cross-view interaction while maintaining computational efficiency. Experimental evaluations on the KITTI2012, KITTI2015, Middlebury, and Flickr1024 datasets show that AMCASSR achieves significant improvements in both PSNR and SSIM metrics over current state-of-the-art methods, especially in weak-textured regions. Validation on downstream tasks further supports its practical applicability in feature and stereo matching.
Sun et al. (Thu,) studied this question.