Sign language is a diversified language used for passing messages differently using hand signs, facial expressions and body actions. Online content is available yet relatively high, yet not enough for people with hearing impairment. This work proposes a new approach to multimodal Sign Language Generation (SLG) that uses hybrid optimization methods combined with Deep Learning (DL) algorithms. The proposed framework improves the efficiency of SLG by executing and fusing data from audio, text and video. This proposed methodology contains a preprocessing phase, feature extraction phase, feature selection with a hybrid optimization phase, multimodal fusion phase, DL modeling phase and SLG phase. Temporal features are extracted using a Recurrent Neural Network (RNN) from audio, a spatial feature from video using ResNet50 and textual features using BERT. Then, feature selection is done using the Hybrid Grey Wolf and Fruit Fly Optimization (HGWFFO) technique to choose the best features. The proposed framework integrates an attention-based DL model that uses Long Short-Term Memory (LSTM) networks, Convolutional Neural Networks (CNN) and transformer models for feature fusion and sign generation. The SLG is performed based on a loop, and the feedback is collected during the calibration process in real time. The proposed framework is intended to contribute to the state of the art in SLG by offering an effective approach and gaining better performance and readability. The developed model gained an accuracy of 0.998817, a specificity of 0.998799 and a sensitivity of 0.990125.
Santhanakrishnan et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: