Improving the accuracy of handwritten character string recognition allows handwritten documents to be converted into digital text. This facilitates camera-based text input, enabling robotic process automation to manage documentation tasks. Although this field has seen significant progress, recognizing handwritten Japanese remains particularly challenging due to the difficulty of character segmentation, the wide variety of character types, and the absence of clear word boundaries. These factors make unconstrained handwritten Japanese string recognition particularly difficult for conventional approaches. Moreover, transformer-based models typically require large amounts of annotated training data. This study proposes and investigates a new String Recognition Transformer (SRT) model capable of recognizing unconstrained handwritten Japanese character strings without relying on explicit character segmentation or a large number of training images. The SRT model integrates a convolutional neural network backbone for robust local feature extraction, a Transformer encoder-decoder architecture, and a sliding window strategy that generates overlapping patches. Comparative experiments show that our method achieved a character error rate (CER) of 0.067, significantly outperforming convolutional recurrent neural network, transformer-based optical character recognition, and handwritten text recognition with Vision Transformer which achieved CERs of 0.664, 0.165, and 0.106, respectively, thereby confirming the effectiveness and robustness of the approach.
Building similarity graph...
Analyzing shared references across papers
Loading...
Shunya Rakuka
Kento Morita
Tetsushi WAKABAYASHI
Journal of Advanced Computational Intelligence and Intelligent Informatics
Mie University
Building similarity graph...
Analyzing shared references across papers
Loading...
Rakuka et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69706c09b6488063ad5c1710 — DOI: https://doi.org/10.20965/jaciii.2026.p0015