March 3, 2026Open Access

Djupinlärningsbaserad detektering och igenkänning av kinesiska tecken i naturliga scener

Key Points

The proposed OCR system achieves a recall rate of 88% in challenging natural scene images, showcasing its reliability.
With a precision of 93.5%, the detection module effectively identifies text regions in complex backgrounds.
Using a complete processing pipeline, the proposal integrates lightweight algorithms for seamless character recognition.
High character-level accuracy of 91.97% indicates the model's ability to recognize text without character segmentation.

Abstract

With the advancement of deep learning and optical character recognition (OCR) technology, extracting textual content from natural scene images has attracted increasing attention, especially in large-scale image analysis and multimedia content understanding. However, challenges remain in the endto- end processing of real-world images with diverse backgrounds, distortions, and varying font styles. Traditional OCR systems often struggle with character segmentation and low recall in noisy or complex scenes. To address these issues, we propose a complete OCR pipeline composed of three lightweight but effective stages: candidate region extraction using the Maximally Stable Extremal Regions (MSER) algorithm, text region detection using a shallow Convolutional Neural Network (CNN), and text content recognition using a combination of convolution layers, maxpooling layers, batch normalization layers, long short-term memory layers, full connection layers and connectionist temporal classification layers. This design avoids the need for character-level segmentation and allows the model to process arbitrary-length text sequences directly from cropped image regions. By training the detection and recognition models on a custom dataset including over 800,000 synthetic images and real-world scene samples, our approach achieves promising results. The detection module yields a precision of 93.5% and a recall of 88%, while the recognition module attains a characterlevel accuracy of 91.97%. These results demonstrate that the proposed system is capable of handling diverse natural scene text with high reliability and can be effectively applied in practical scenarios such as content indexing or automated annotation.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Miao Peng (Wed,) studied this question.

synapsesocial.com/papers/69a7622fc6e9836116a306a9

Authors

Miao Peng

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Djupinlärningsbaserad detektering och igenkänning av kinesiska tecken i naturliga scener

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion