What type of study is this?

This is a Quantitative Study study.

September 18, 2025

Benchmarking large language models for handwritten text recognition

Key Points

LLMs demonstrate strong performance on English text but struggle with other languages, impacting their overall effectiveness.
Comparison with task-specific models, such as Transkribus, shows proprietary LLM models outperform for modern handwriting.
Evaluation used open benchmarks to assess models' capabilities, shedding light on self-correction limitations among LLMs.
Findings suggest potential enhancements for HTR applications, particularly in historical document analysis.

Abstract

Purpose The aim of this work is to provide an overview of the current capabilities of Multimodal Large Language Models (MLLMs) for Handwritten Text Recognition (HTR), assessing their potential when compared to traditional task-specific, supervised models. Design/methodology/approach The approach is that of using a set of openly-available benchmarks to compare different LLMs with strong task-specific supervised baselines for the task of HTR. Findings The results show that LLMs currently show a strong performance on English texts, yet they demonstrate a weaker performance on languages other than English, and do not possess a significant capability for self-correction. Moreover, their comparison with Transkribus’s models highlight the fact that proprietary LLM models are the best performing, in particular on modern handwriting, while for historical documents the overall performance comparison between LLMs and Transkribus is not consistent. Originality/value The authors are not aware of a similar study relying on open benchmarks.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Giorgia Crosilla

Lukas Klic

Giovanni Colavizza

Journals

Journal of Documentation

Actions

Institutions

University of Copenhagen

University of Bologna

Digital Science (United States)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Benchmarking large language models for handwritten text recognition

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study