What question did this study set out to answer?

The research aims to enhance handwritten Devanagari character recognition by utilizing public datasets and effective preprocessing techniques.

June 2, 2026Open Access

A Study on Publicly Available Datasets and Preprocessing Techniques for Handwritten Devanagari Character Recognition

Key Points

The research aims to enhance handwritten Devanagari character recognition by utilizing public datasets and effective preprocessing techniques.
Selected the Devanagari Handwritten Character Dataset (DHCD) and the ECO-LAPS dataset for analysis.
Designed a preprocessing pipeline involving normalization, grayscale conversion, binarization, and noise removal.
Conducted morphological processing, shirorekha removal, contour refinement, and size standardization.
The standardized dataset significantly reduced intra-class variations, leading to improved recognition reliability.
Effective preprocessing methods contributed more to classification accuracy than classifier selection.
A robust preparation process ensures reproducibility and fair evaluation in optical character recognition systems.

Abstract

Handwritten character recognition is a fundamental research area in pattern recognition and document image analysis. It has wide-ranging applications such as digitization of handwritten documents, automation of data entry systems, archival of historical manuscripts, and development of assistive technologies. In the Indian context, the Devanagari script holds special importance as it is used by several major languages including Hindi, Marathi, Sanskrit, Nepali, and Konkani. However, the recognition of handwritten Devanagari characters remains a challenging task due to the complexity of character structures, the presence of modifiers, the shirorekha (headline), and the large variability in handwriting styles. While many recent studies focus on deep learning-based recognition systems, the importance of dataset quality and preprocessing remains fundamental for building any reliable optical character recognition (OCR) system. Poor preprocessing leads to noisy feature representations, reduced classification accuracy, and poor generalization across writers. In many practical scenarios, preprocessing contributes more to system reliability than the choice of classifier itself. This paper focuses on the first and most essential objective of handwritten Devanagari OCR: the use of publicly available benchmark datasets and their systematic preprocessing to obtain clean, standardized handwritten character samples for training and evaluation. Two widely used datasets, namely the Devanagari Handwritten Character Dataset (DHCD) and the ECO-LAPS dataset, are selected for this study. A complete preprocessing pipeline is designed consisting of normalization, grayscale conversion, binarization, noise removal, morphological processing, shirorekha removal, contour refinement, and size standardization. The study highlights the importance of dataset preparation in reducing intra-class variations and improving recognition reliability. The resulting standardized dataset forms a strong foundation for classical machine learning based OCR systems and ensures reproducibility and fair evaluation.

A Study on Publicly Available Datasets and Preprocessing Techniques for Handwritten Devanagari Character Recognition

Key Points

Abstract

Cite This Study