What question did this study set out to answer?

The aim is to develop a framework for more accurate sign language recognition by integrating various deep learning techniques.

January 23, 2026

Advanced Sign Language Translation: A Holistic Network for Hand Gesture Recognition Using Deep Learning

Key Points

The aim is to develop a framework for more accurate sign language recognition by integrating various deep learning techniques.
Developed the holistic sign language interpretation network (HSLIN) for Indian sign language recognition.
Incorporated uniformed frame isolation and augmentation (UFIA) for preprocessing and noise removal.
Utilized synaptic gesture movement analysis (SGMA) to capture detailed motion.
Achieved 99.9% accuracy on the ISL-CSLTR dataset.
Recorded 100% precision and F1-score, with 99.9% recall across 14 sign classes.
An ablation study confirmed the effectiveness of architectural components in enhancing performance.

Abstract

ABSTRACT Sign language recognition (SLR) requires interpreting dynamic hand gestures with complex variations in shape, orientation, motion, and spatial configuration. Conventional models such as U‐Net and ResNet offer strengths in segmentation and feature extraction, respectively, but face critical limitations. U‐Net struggles with retaining fine spatial details in cluttered backgrounds and lacks temporal modeling, while ResNet can lose motion continuity and suffers from vanishing gradient issues in deeper architectures. To overcome these challenges, we propose the holistic sign language interpretation network (HSLIN), a novel deep learning framework tailored for Indian sign language (ISL) recognition. HSLIN incorporates three key innovations: Uniformed frame isolation and augmentation (UFIA) for standardized preprocessing and noise removal, synaptic gesture movement analysis (SGMA) for capturing detailed motion using keypoint detection and optical flow, and a hybrid architecture combining U‐Net‐based segmentation with an enhanced ResNet‐TC50V2 backbone. The novelty lies in fusing spatial precision with deep temporal modeling through bottleneck layers and temporal convolutional layers (TCL), enabling the model to effectively learn gesture patterns over time. Experimental results on the ISL‐CSLTR dataset demonstrate that the proposed method achieves an accuracy of 99.9%, a precision of 100%, recall of 99.9%, and an F1‐score of 100% across 14 word‐level sign classes. Furthermore, an ablation study confirms the critical role of each architectural component in achieving optimal performance. These outcomes clearly establish the robustness, efficiency, and uniqueness of the proposed HSLIN framework, positioning it as a powerful solution for real‐world ISL recognition and communication accessibility for the deaf and hard‐of‐hearing community.

Bookmark

Cite This Study

Reeja et al. (Thu,) studied this question.

synapsesocial.com/papers/69730f18c8125b09b0d1ef2a https://doi.org/https://doi.org/10.1002/cav.70084

Bookmark