What does this research mean for the field?

A newly developed open-source optical character recognition (OCR) tool utilizing a Convolutional Recurrent Neural Network can accurately transcribe Gə'əz manuscripts while operating efficiently without the need for a graphics processing unit (GPU). Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The research aims to create a tool for transcribing Gə'əz manuscripts using deep learning techniques.

March 22, 2026Open Access

Automated Transcription of Gə'əz Manuscripts Using Deep Learning

Key Points

The research aims to create a tool for transcribing Gə'əz manuscripts using deep learning techniques.
Developed an open-source optical character recognition (OCR) tool for Gə'əz manuscripts
Incorporated a convolutional recurrent neural network for transcription
Designed a custom data curation process for Gə'əz language
Ensured the tool can operate offline and without GPU
The tool achieves high accuracy in transcribing Gə'əz manuscripts
Accessible for students and scholars interested in Ethiopian manuscripts
Can potentially be retrained for other under-resourced scripts
Requires less computing power than traditional AI systems

Abstract

This paper describes a collaborative project designed to meet the needs of communities interested in Gə'əz language texts – and other under-resourced manuscript traditions – by developing an easy-to-use open-source tool that converts images of manuscript pages into a transcription using optical character recognition (OCR). Our computational tool incorporates a custom data curation process to address the language-specific facets of Gə'əz coupled with a Convolutional Recurrent Neural Network to perform the transcription. An open-source OCR transcription tool for digitized Gə'əz manuscripts can be used by students and scholars of Ethiopian manuscripts to create a substantial and computer-searchable corpus of transcribed and digitized Gə'əz texts, opening access to vital resources for sustaining the history and living culture of Ethiopia and its people. With suitable ground-truth, our open-source OCR transcription tool can also be retrained to read other under-resourced scripts. The tool we developed can be run without a graphics processing unit (GPU), meaning that it requires much less computing power than most other modern AI systems. It can be run offline from a personal computer, or accessed via a web client and potentially in the web browser of a smartphone. The paper describes our team’s collaborative development of this first open-source tool for Gə'əz manuscript transcription that is both highly accurate and accessible to communities interested in Gə'əz books and the texts they contain.

KI fragen

Bookmark

View Full Paper