July 1, 1992

Omnidocument technologies

Key Points

Key points are not available for this paper at this time.

Abstract

An optical character recognition (OCR) engine that is omnifont and reasonably robust on individual degraded characters is presented. The weakest link is its handling of characters which are difficult to segment. The engine is divided into four phases: segmentation, image recognition, ambiguity resolution, and document analysis. The features are zonal and reduce the image to a blurred, gray-level representation. The classifier is data-driven, trained offline, and model-free. Handcrafted features and decision trees tend to be brittle in the presence of noise. To satisfy the needs of full-text applications, the system captures the structure of the document so that, when viewed in a word processor or spreadsheet program, the formatting of the optically recognized document reflects that of the original document. To satisfy the needs of the forms market, a proofing and correction tool displays 'pop-up' images of uncertain characters.>

AI에게 질문

Bookmark

Cite This Study

Mindy Bokser (Wed,) studied this question.

synapsesocial.com/papers/6a1c6be623b9c7180b2fc168 https://doi.org/https://doi.org/10.1109/5.156470

AI에게 질문

Bookmark