This project presents a real-time American Sign Language (ASL) recognition system using a standard webcam. Communication between deaf or hard-of-hearing individuals and the hearing community is often limited by the high cost and limited availability of professional interpreters. To address this, the proposed system employs an ensemble deep-learning approach that combines a Convolutional Neural Network (CNN) for hand shape recognition, a Graph Neural Network (GNN) to capture finger and joint relationships, and a Vision Transformer to focus on key visual regions while minimizing background noise. By fusing these complementary models, the system achieves enhanced recognition accuracy. The framework was trained and evaluated on a dataset of approximately 87,000 labeled images covering the complete ASL alphabet along with additional gestures such as space and delete. Experimental results demonstrate an accuracy exceeding 95%, outperforming existing methods. The system supports real-time interaction with an average inference time of about 85 milliseconds per gesture. It is deployed through a browser-based interface and requires no specialized hardware beyond a standard webcam. This solution provides an accessible, low-cost alternative to traditional interpretation services and promotes inclusive communication across educational, healthcare, and public environments.
Building similarity graph...
Analyzing shared references across papers
Loading...
Samruddhi Vijay Wakalkar
Sanskruti Vijay Wakalkar
Siddhi Nanasaheb Hon
Building similarity graph...
Analyzing shared references across papers
Loading...
Wakalkar et al. (Thu,) studied this question.
www.synapsesocial.com/papers/699012032ccff479cfe58b0b — DOI: https://doi.org/10.5281/zenodo.18619156