What question did this study set out to answer?

The aim is to develop an efficient deep learning framework for recognizing textual stamps on index cards from the Lessico Etimologico Italiano.

March 7, 2026

Deep Learning for Textual Stamp Recognition on Index Cards of the Lessico Etimologico Italiano

Key Points

The aim is to develop an efficient deep learning framework for recognizing textual stamps on index cards from the Lessico Etimologico Italiano.
Utilized a deep learning workflow for processing scanned index cards.
Implemented automatic detection, alignment, and recognition of textual stamps.
Developed an embedding-based retrieval system for identifying unseen stamps.
Conducted experimental evaluations comparing against recent OCR methods.
Achieved a mean average precision of 98.80% for stamp detection.
Obtained an accuracy of 97.02% for stamp recognition.
Surpassed the OCR performance of two large language models, achieving better accuracy.
Demonstrated significantly higher accuracy compared to a multimodal large language model.

Abstract

Abstract As a long-term scientific project on the history of the Italian language, the Lessico Etimologico Italiano (LEI) represents one of the most important and ambitious historical and etymological dictionary projects ever undertaken. The LEI, which started in 1968, documents and analyzes every single word of the Italian language and all Italian dialects from their beginnings to today. Until 2018, the editorial process used the traditional lexicographical method of creating annotated index cards to collect information. Each index card contains text areas, handwritten annotations, and/or textual stamps. Of particular interest are the etymon, context (the source text), and textual stamp, which contains the abbreviated title of the book or source from which it was copied or taken. In this paper, we present a novel approach for efficiently processing a large number of scanned index cards to accelerate the philological work required for producing the LEI. For this purpose, a deep learning workflow for automatic detection, alignment, and recognition of textual stamps on digitized index cards is proposed. To support large-scale indexing under an initially incomplete stamp inventory, we introduce an embedding-based retrieval workflow that enables the identification and integration of previously unseen stamps during operation. Our experimental evaluations show excellent results for stamp detection and stamp recognition, with a mean average precision of 98.80% and an accuracy of 97.02%, respectively. In addition, we compare our stamp recognition approach with two recent Large Language Model (LLM) approaches for OCR. Our best approach achieves 97.02% accuracy, which surpasses the OCR performance of the two LLMs (91.60% and 89.47%, respectively). Furthermore, we compare our approach to a recent multimodal large language model (MM-LLM) on a benchmark dataset and observe substantially lower accuracy for the LLM (up to 84.58%) compared to our approach (98.61%).

Bookmark

Cite This Study

Korfhage et al. (Thu,) studied this question.

synapsesocial.com/papers/69abc1b45af8044f7a4ea941 https://doi.org/https://doi.org/10.1093/ijl/ecag001

Bookmark