Key points are not available for this paper at this time.
Scene Text Image Super-resolution (STISR) aims to enhance the resolution of images containing text within a scene, making the text more readable and easier to recognize. This technique has broad applications in numerous fields such as autonomous driving, document scanning, image retrieval, and so on. However, most existing STISR methods have not fully exploited the multi-scale structural and semantic information within scene text images. As a result, the restored text image quality is not sufficient, significantly impacting subsequent tasks such as text detection and recognition. Hence, this paper proposes a novel scheme that leverages multi-scale structural and semantic priors to efficiently guide text semantic restoration, ultimately yielding high-quality text images. First, a multi-scale interaction attention (MSIA) module is designed to capture location-specific details of various-scale structural features and facilitate the recovery of semantic information. Second, a multi-scale prior learning module (MSPLM) is developed. Within this module, skip connections are employed among codecs to strengthen both structural and semantic prior features, thereby enhancing the up-sampling and reconstruction capabilities. Finally, building upon the MSPLM, cascaded encoders are connected through residual connections to further enrich the multi-scale features and bolster the representational capacity of the prior. Experiments conducted on the standard TextZoom dataset demonstrate that the average recognition accuracies of three evaluators—ASTER, CRNN, and MORAN—are 64.4%, 53.5%, and 60.8%, respectively, surpassing most existing methods, including the state-of-the-art ones.
Zhu et al. (Tue,) studied this question.