Key points are not available for this paper at this time.
Transformers have exhibited considerable promise in image super-resolution (SR) owing to their capability of establishing long-range dependencies. Nonetheless, vision transformers approach an image as a 1D token sequence, lacking inductive biases to model local visual patterns and scale invariance, which are essential for recovering local details. To address these challenges, we introduce EViTIB, a transformer-based image super-resolution network that integrates the inherent inductive biases of CNNs. EViTIB adopts a concurrent structure where each transformer layer incorporates a convolution branch in parallel with the multi-head self-attention branch. The features from these two branches are subsequently aggregated via a Hybrid Feature Coupling (HFC) module. Consequently, EViTIB takes advantage of locality inductive biases while maintaining the capacity to encompass global dependencies. Extensive experiments demonstrate that, under comparable parameter complexity and FLOPs, EViTIB outperforms recent state-of-the-art SR methods.
Yu et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: