Despite recent advances in deep learning (DL) for sign language recognition (SLR), most existing systems remain limited to monolingual datasets, lack interpretability, and are too computationally intensive for real-time edge deployment. With the growing need for inclusive and real-time communication technologies, efficient and deployable SLR systems are of critical importance. This paper presents TinyMSLR, an explainable, lightweight framework designed for isolated-sign (gloss) classification on resource-constrained devices. TinyMSLR combines a ConvNeXt-Tiny encoder for fine-grained local visual cues with a Swin Transformer encoder for long-range spatio-temporal context, and integrates an adaptive fusion gate to balance both streams. To further improve accuracy under strict compute and memory budgets, we introduce a dual-teacher knowledge distillation (KD) scheme that transfers complementary spatial and contextual knowledge from high-capacity CNN and Transformer teachers to the compact student model. We evaluate TinyMSLR in a controlled multilingual setting using two public datasets (DGS RWTH-PHOENIX-Weather 2014T and Mandarin CSL) by constructing a shared subset of 20 semantically aligned sign classes and segmenting RWTH continuous sequences into single-gloss clips. Therefore, all reported results correspond to isolated-sign recognition rather than continuous sentence-level multilingual CSLR. On this benchmark, TinyMSLR achieves 99.28% training accuracy and 99.01% validation accuracy, with an F1-score of 98.96%, while keeping the parameter count under 2.7M. Inference latency is 24 ms on standard CPUs and under 13.5 ms on edge GPUs. Overall, TinyMSLR demonstrates a practical accuracy-efficiency-explainability trade-off that is well aligned with deployment-ready multilingual isolated-sign systems on the edge.
Lamaakal et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: