Data visualizations and analysis from diagrams, such as bar charts and pie charts, remain largely inaccessible to blind and visually impaired individuals, despite their widespread use in education, science, and public communication. In this work, we present a learning-based framework that automatically detects, extracts, and interprets the contents of bar charts, translating them into multimodal outputs such as text-to-speech narratives and personified representations. Our system employs a YOLOv8 object detection model trained on a custom-generated data set of annotated bar charts, capable of identifying key component of the chart, including bars, slices, angles, axes, titles and labels. Following detection, we integrate Optical Character Recognition (OCR) to decode textual labels and numerals, while a pixel-to-value mapping algorithm interprets bar heights relative to axis scales. The extracted information is dynamically translated into audio descriptions and interactive query-based speech outputs, enabling non-visual comprehension of chart data. To ensure usability, we designed the output flow to follow a hierarchical structure from summary to specific values. Preliminary evaluations show that the system achieves high accuracy in detecting component parts of the chart and generating bar values, with promising feedback from a blind user in pilot tests. This research contributes to inclusive AI, opening new directions for accessible data communication through the integration of computer vision, speech synthesis, and assistive reasoning in an abstract environment.
Kour et al. (Sun,) studied this question.