What question did this study set out to answer?

The aim is to create an automated system that recognizes sign language gestures and facilitates communication for individuals with hearing and speech difficulties.

April 22, 2026Open Access

View Full Paper

Sign Language Recognition and Communication System

RMRaghava Mattegunta

Key Points

The aim is to create an automated system that recognizes sign language gestures and facilitates communication for individuals with hearing and speech difficulties.
Utilized YOLOv8 for hand detection
Employed MediaPipe for 21-point landmark extraction
Implemented a Vision Transformer for gesture classification
Achieved 91% accuracy in gesture recognition
CPU inference latency was below 5 seconds
Included features such as speech-to-text and text-to-speech translation across six Indian languages.

Abstract

Communication is a very important part of human life. But few individuals face challenges incommunicating with others due to various reasons such as physical or psychological issues.These issues can lead to inability to hear, speak or both. This limits their access to socialinteraction. This led to sign language creation. Sign language relies on hand gestures, facialexpressions, and body movements. It remains inaccessible to most people, which widens thiscommunication gap. Advances in Artificial Intelligence, Computer Vision, and NaturalLanguage Processing have opened new ways to bridge this gap using automated gesturerecognition systems. Early systems relied on data gloves, depth sensors, and CNN-basedmodels to detect hand gestures. Their major shortcomings were expensive hardware and limitedgesture vocabulary. Many approaches achieved recognition only under controlled conditionsand lacked robustness to varying lighting and real-world environments. However, most priorsystems did not provide communication support such as speech conversion or multilingualtranslation. The proposed system addresses these shortcomings using YOLOv8 for handdetection, MediaPipe for 21-point landmark extraction, and a Vision Transformer for gestureclassification, achieving 91% accuracy with CPU inference latency below 5 seconds.Additional features include speech-to-text, text-to-speech, and translation across six Indianregional languages. A Streamlit front-end allows a hardware-free and accessible userexperience.

KI fragen

Bookmark

View Full Paper

KI fragen

Bookmark

View Full Paper

Sign Language Recognition and Communication System

Key Points

Abstract

Cite This Study