What question did this study set out to answer?

The aim is to develop a robust system for detecting and extracting text embedded in images, overcoming limitations of current OCR technologies.

January 1, 1997Open Access

Finding text in images

Key Points

The aim is to develop a robust system for detecting and extracting text embedded in images, overcoming limitations of current OCR technologies.
A four-step procedure was proposed to detect and extract text from images.
Texture segmentation focused on text regions, followed by stroke extraction and bounding box formation.
Text detection was facilitated by applying the method on a pyramid of images at multiple resolutions.
The system effectively identifies text over various font sizes and diverse backgrounds.
It outperforms traditional OCR approaches in terms of stability and accuracy across multiple image types.

Abstract

Systems. When machine generated text is prdnted against clean backgrounds, it can be converted to a computer readable form (ASCII) using current Optical Character Recognition (OCR) technology. However, text is often printed against shaded or textured backgrounds or is embedded in images. Examples include maps, advertisements, photographs, videos and stock certificates. Current document segmentation and recognition technologies cannot handle these situafons well. In this paper, a four-step system which automaticnlly detects and extracts text in images i& proposed. First, a texture segmentation scheme is used to focus attention on regions where text may occur. Second, strokes are extracted from the segmented text regions. Using reasonable heuristics on text strings such as height similarity, spacing and alignment, the extracted strokes are then processed to form rectangular boxes surrounding the corresponding ttzt strings. To detect text over a wide range of font sizes, the above steps are first applied to a pyramid of images generated from the input image, and then the boxes formed at each resolution level of the pyramid are fused at the image in the original resolution level. Third, text is extracted by cleaning up the background and binarizing the detected ted strings. Finally, better text bounding boxes are generated by srsiny the binarized text as strokes. Text is then cleaned and binarized from these new boxes, and can then be passed through a commercial OCR engine for recognition if the text is of an OCR-recognizable font. The system is stable, robust, and works well on imayes (with or without structured layouts) from a wide van'ety of sources, including digitized video frames, photographs,

Mark Helpful

Bookmark

Relay

View Full Paper