This work describes BlindSpot-VisionGuide, an integrated, AI-based assistive system that aims to empower visually impaired people towards independence through real-time audio interaction. The system incorporates three fundamental capabilities-face recognition, image captioning, and reading online newspapers-into a voice-based platform deployable in Raspberry Pi hardware. The face recognition capability recognizes known people using deep facial embeddings and returns instant voice feedback. The image captioning module uses a transformer-based BLIP model to produce natural language descriptions of scenes captured. The online newspaper module fetches structured news content through APIs and converts it into speech through a text-to-speech engine. The voice interface is centralized for all the modules, enabling users to interact with their surroundings without their hands. The system has been tested for recognition accuracy, response time, and memory consumption on a Raspberry Pi 5. Experiments indicate that the platform operates reliably in all modules, striking a balance between computation and user-friendliness. Optimized for offline use and low-power devices, BlindSpot illustrates the practical applicability of embedded AI towards the creation of inclusive, scalable assistive technology. The authors conclude by noting potential extensions, such as object detection, multi-language support, and caregiver incorporation, making BlindSpot a fundamental model for vision-based accessibility systems of the next generation.
Sudha et al. (Fri,) studied this question.