Recent advancements in multimedia technology have revolutionized digital device interactions, with sign language recognition emerging as a crucial tool for improving accessibility. This article introduces an innovative approach to integrating language-based sign language recognition into multimedia applications, enabling automatic recognition of gestures across various nationalities in dynamic sign language environments (video). Our research explores the potential of this technology to enhance accessibility and usability, particularly for individuals with hearing impairments, by facilitating intuitive and real-time control over video conferencing applications. We identify existing challenges in multimedia applications and propose a novel framework incorporating sign language recognition algorithms. Our approach involves developing a prototype multimedia application designed for fast communication in crowded environments and backgrounds. Feature extraction is performed using the Mediapipe Holistic framework, which captures hand-shoulder distances, finger angles, and finger usage. Gesture classification is achieved using the K-Nearest Neighbors (KNN) algorithm, effectively recognizing international sign language gestures. Additionally, for language and word prediction, we employ Convolutional Recurrent Neural Networks (CRNNs) enhanced by Long Short-Term Memory (LSTM) to process diverse linguistic contexts. Experimental results confirm the robustness of our approach, achieving 89% accuracy in multiclass gesture classification and exceeding 93% accuracy in word prediction across a large, diverse dataset collected from multiple sources and languages. To enhance practical applicability, we integrated our techniques into a custom video conferencing application built with Django. This application seamlessly incorporates our feature extraction and prediction models, offering an innovative, two-way communication platform with improved accessibility features.
Yüksel et al. (Wed,) studied this question.