What type of study is this?

This is a Experimental Study study.

September 19, 2025Open Access

Multi‐Task Learning for Chinese Character and Radical Recognition With Dynamic Channel‐Spatial Attention and Rotational Positional Encoding

Puntos clave

The proposed model significantly enhances character recognition performance for both printed and handwritten text.
Experimental results reveal that the model outperforms previous methods across four benchmark datasets.
Integration of dynamic channel-spatial attention optimizes feature representations by focusing on global dependencies.
Rotational position encoding effectively captures complex spatial relationships in hierarchal structures of Chinese characters.

Resumen

ABSTRACT Optical character recognition (OCR) plays a crucial role in digitizing archives and documents. However, recognizing complex Chinese characters remains challenging owing to their intricate structures and sequential patterns. This study introduces an advanced OCR model that integrates EfficientNetV2 as the backbone within a transformer‐based architecture to enhance feature extraction. To address the limitations of traditional adaptive feature selection, we propose a dynamic collaborative channel–spatial attention (DCCSA) module. This module combines channel attention, spatial attention, and channel shuffling to dynamically capture global dependencies and optimize feature representations across both spatial and channel dimensions. Additionally, rotational position encoding (RoPE) is incorporated into the transformer to accurately capture the spatial relationships between characters and radicals, ensuring precise representation of complex hierarchal structures. Further, the model adopts a multitask learning framework that jointly decodes characters and radicals, enabling cross‐task optimization and significantly enhancing recognition performance. Experimental results on four benchmark datasets demonstrate that the proposed model outperforms existing methods, achieving significant improvements on both printed and handwritten Chinese text. Moreover, the model shows strong generalization capabilities on challenging scene‐text datasets, underscoring its effectiveness in addressing the OCR challenges associated with intricate scripts.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo