Digitizing handwritten documents is vital across domains like education, healthcare, and commerce, enabling better access and preservation. However, offline handwritten word recognition (HWR) remains a major challenge—particularly for complex Indic scripts such as Devanagari—due to the cursive nature of writing and similarity between character forms. While several studies have attempted to address this problem, most have focused on English datasets. Some efforts explored Indic scripts, but they often rely on a combination of real and synthetic data (data augmentation) and predefined lexicon-based decoding. In this paper, we propose a novel attention-based deep learning approach, named SCRAT-Net, which effectively incorporates an attention mechanism within the CNN-RNN architecture to accurately recognize words from handwritten images. SCRAT-Net consists of five key components in its pipeline: STN, CNN, RNN, Attention, and Transcription. The proposed approach operates without relying on a predefined external lexicon or synthetic data augmentation, unlike many existing methods. SCRAT-Net is evaluated against several state-of-the-art methods on two datasets—IIIT-HW-Dev (Hindi) and IAM (English)—and outperforms them in terms of standardWord Error Rate (WER) and Character Error Rate (CER). Our results show that SCRAT-Net achieves approximately a 50% and 7% improvement in CER on the IIIT-HW-Dev and IAM datasets, respectively, compared to the best-performing competitor.
Rastogi et al. (Thu,) studied this question.