What question did this study set out to answer?

To enhance Arabic Optical Character Recognition (OCR) through adapted Vision-Language Models (VLMs).

March 23, 2026Open Access

Domain-Specific Adaptation of Vision-Language Models for Arabic OCR

Key Points

To enhance Arabic Optical Character Recognition (OCR) through adapted Vision-Language Models (VLMs).
Parameter-efficient fine-tuning of the Qwen2.5-VL model using LoRA.
Utilized a mixed-domain dataset including modern print and historical manuscripts.
Implemented 4-bit quantization for efficient training.
Achieved a 29% reduction in Character Error Rate for modern print documents.
Achieved 17% improved accuracy for historical documents.
Exceeds performance of standard OCR engines.

Abstract

The Arabic Optical Character Recognition (OCR) task is considered a difficult task. This is due to the language’s cursive morphology, common diacritic changes, and various calligraphy types throughout printed, handwritten, and historical documents. Recently, breakthroughs in Vision-Language Models (VLMs) have achieved remarkable progress in multilingual OCR; however, their systematic adaptation to the Arabic language is yet underexplored. This paper presents a parameter-efficient fine-tuning of the Qwen2.5-VL model using LoRA (4-bit quantization) adapted for a mixed-domain dataset that consists of both modern print and historical manuscripts. The proposed method permits efficient training using a relatively small computational power. The KITAB-Bench benchmark tests show considerable gains and a Character Error Rate reduction of 29% on modern print and 17% accuracy on historical documents, beating the standard OCR engines. These findings demonstrate the capability of VLM-based approaches for robust Arabic OCR and the need for resource-efficient adaptation strategies for practical deployment.

Read Full Paperexternally

AI से पूछें

Bookmark

View Full Paper