May 1, 2016Open Access

Using SMT for OCR Error Correction of Historical Texts

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

A trend to digitize historical paper-based archives has emerged in recent years, with the advent of digital optical scanners. A lot of -based books, textbooks, magazines, articles, and documents are being transformed into electronic versions that can be manipulated a computer. For this purpose, Optical Character Recognition (OCR) systems have been developed to transform scanned digital into editable computer text. However, different kinds of errors in the OCR system output text can be found, but Automatic Error tools can help in performing the quality of electronic texts by cleaning and removing noises. In this paper, we perform a and quantitative comparison of several error-correction techniques for historical French documents. Experimentation shows our Machine Translation for Error Correction method is superior to other Language Modelling correction techniques, with nearly 13% relative improvement compared to the initial baseline.

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo