What question did this study set out to answer?

This review aims to synthesize developments in deep learning methods for video-based sign language recognition, addressing challenges and dataset issues.

May 24, 2026Open Access

A Comprehensive Review of Deep Learning Approaches for Video-Based Sign Language Recognition: Datasets, Challenges and Insights

Key Points

This review aims to synthesize developments in deep learning methods for video-based sign language recognition, addressing challenges and dataset issues.
Reviewed over 100 research papers published between 2020 and 2026
Analyzed deep learning approaches including feature extraction, attention mechanisms, and hybrid frameworks
Developed a six-category taxonomy for systematic comparison of architectures
Identified persistent challenges like data scarcity in low-resource sign languages
Highlighted recent advancements such as real-time recognition systems and multimodal fusion
Outlined strategic directions for improving dataset development in sign language recognition

Abstract

This study presents a comprehensive review of more than 100 research papers on sign language recognition (SLR) published between 2020 and 2026. The analysis focuses on deep learning approaches applied to video-based SLR, including spatiotemporal feature extraction, temporal modeling, attention mechanisms, motion-based representations, hybrid frameworks, transfer learning methods and other methods. Particular attention is given to how these methods model spatiotemporal dynamics and capture subtle gesture characteristics in sign language communication. The review highlights several recent developments, such as the introduction of specialized datasets, the emergence of real-time recognition systems, and the integration of multimodal fusion strategies. At the same time, persistent challenges remain, including data scarcity in low-resource sign languages, limited linguistic standardization of datasets, and insufficient model interpretability. The findings underline the importance of developing scalable and generalizable models capable of handling diverse datasets and user variability. The distinct contributions of this review are fourfold: (1) a comprehensive synthesis of over 100 studies published between 2020 and 2026, covering the full spectrum of deep learning architectures for video-based SLR; (2) a structured six-category taxonomy enabling systematic cross-architectural comparison; (3) a comprehensive focus on low-resource sign languages, which remain underrepresented in the existing literature; and (4) a critical analysis of the current benchmark landscape for low-resource sign languages, identifying key gaps and outlining strategic directions for future dataset development. These contributions are intended to guide further research toward more robust, inclusive, and universally applicable SLR systems.

Read Full Paperexternally

AI에게 질문

Bookmark

View Full Paper