ABSTRACT Sign language recognition (SLR) requires interpreting dynamic hand gestures with complex variations in shape, orientation, motion, and spatial configuration. Conventional models such as U‐Net and ResNet offer strengths in segmentation and feature extraction, respectively, but face critical limitations. U‐Net struggles with retaining fine spatial details in cluttered backgrounds and lacks temporal modeling, while ResNet can lose motion continuity and suffers from vanishing gradient issues in deeper architectures. To overcome these challenges, we propose the holistic sign language interpretation network (HSLIN), a novel deep learning framework tailored for Indian sign language (ISL) recognition. HSLIN incorporates three key innovations: Uniformed frame isolation and augmentation (UFIA) for standardized preprocessing and noise removal, synaptic gesture movement analysis (SGMA) for capturing detailed motion using keypoint detection and optical flow, and a hybrid architecture combining U‐Net‐based segmentation with an enhanced ResNet‐TC50V2 backbone. The novelty lies in fusing spatial precision with deep temporal modeling through bottleneck layers and temporal convolutional layers (TCL), enabling the model to effectively learn gesture patterns over time. Experimental results on the ISL‐CSLTR dataset demonstrate that the proposed method achieves an accuracy of 99.9%, a precision of 100%, recall of 99.9%, and an F1‐score of 100% across 14 word‐level sign classes. Furthermore, an ablation study confirms the critical role of each architectural component in achieving optimal performance. These outcomes clearly establish the robustness, efficiency, and uniqueness of the proposed HSLIN framework, positioning it as a powerful solution for real‐world ISL recognition and communication accessibility for the deaf and hard‐of‐hearing community.
Reeja et al. (Thu,) studied this question.