ABSTRACT Optical character recognition (OCR) plays a crucial role in digitizing archives and documents. However, recognizing complex Chinese characters remains challenging owing to their intricate structures and sequential patterns. This study introduces an advanced OCR model that integrates EfficientNetV2 as the backbone within a transformer‐based architecture to enhance feature extraction. To address the limitations of traditional adaptive feature selection, we propose a dynamic collaborative channel–spatial attention (DCCSA) module. This module combines channel attention, spatial attention, and channel shuffling to dynamically capture global dependencies and optimize feature representations across both spatial and channel dimensions. Additionally, rotational position encoding (RoPE) is incorporated into the transformer to accurately capture the spatial relationships between characters and radicals, ensuring precise representation of complex hierarchal structures. Further, the model adopts a multitask learning framework that jointly decodes characters and radicals, enabling cross‐task optimization and significantly enhancing recognition performance. Experimental results on four benchmark datasets demonstrate that the proposed model outperforms existing methods, achieving significant improvements on both printed and handwritten Chinese text. Moreover, the model shows strong generalization capabilities on challenging scene‐text datasets, underscoring its effectiveness in addressing the OCR challenges associated with intricate scripts.
Deng et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: