Recognizing historical documents is vital in protecting cultural heritage by facilitating access, search, and analysis of important archival materials. Nonetheless, current techniques face difficulties due to factors like poor image quality, diverse handwriting styles, erased or missing words, damaged documents, and intricate page designs. These challenges affect precise text extraction and reduce the overall efficiency of automated recognition systems. In this work, a Pine Makeup Optimization enabled Convolutional Generative Transformer Network (PMOCGTN) is proposed for missing character recognition in historical documents. Firstly, an input historical document image is applied for image enhancement using the Multi-scale Gray World Algorithm. Then, segmentation of each text line and segmentation of each word within the lines is performed using the Semantic Text Segmentation Network (STSN). Finally, missing character recognition and filling of missed characters are accomplished using CGTN. Here, a Convolutional Neural Network (CNN) model is modified by incorporating a Generative Pre-Trained Transformer (GPT) layer to form CGTN, which is trained using a Pine Makeup Optimization (PMO), and is a merging of Pine Cone Optimization Algorithm (PCOA) and Makeup Artist Optimization Algorithm (MAOA). Lastly, an Optical Character Recognition (OCR) document is obtained as the output.
Rachoti et al. (Thu,) studied this question.