Visually impaired individuals face significant challenges when it comes to reading traditional printed books because they rely heavily on visual cues to access written content. Without the ability to see, they cannot read the text directly, making conventional reading impossible. As a result, visually impaired people often depend on assistive technologies such as screen readers and audiobooks to access written material. However, converting normal books into audiobooks is a time-consuming, labor-intensive, and expensive process. It involves hiring professional narrators, recording the entire book, and editing the audio to ensure clarity and quality. This process requires significant human and technical resources, driving up costs. To address this problem, I propose a machine learning-based Optical Character Recognition system to convert text images into audio signals. The proposed system utilizes convolutional neural networks and long short-term memory for accurate text recognition and conversion. The approach achieved a word-based exact matching score of 93.724, which is a remarkable result. Furthermore, I implemented the system on a low-cost embedded board to demonstrate its feasibility and applicability in real-world scenarios. I expect that this approach can help visually impaired individuals access written content more easily and affordably.
Ariyoshi et al. (Fri,) studied this question.