What question did this study set out to answer?

The aim is to improve hand gesture recognition accuracy by addressing limitations such as lighting and posture variations.

May 15, 2026Open Access

A Multi-Model CNN Approach Using Pre-Trained Network for Improved Hand Gesture Recognition

Puntos clave

The aim is to improve hand gesture recognition accuracy by addressing limitations such as lighting and posture variations.
Introduced a Gradient-Based Augmentation Validation (GBAV) framework for safe augmentation ranges.
Utilized a multi-backbone Convolutional Neural Network (CNN) architecture combining ResNet50 and InceptionV3.
Evaluated on three static gesture datasets using 5-fold cross-validation.
Achieved validation accuracies of 96.87% for BISINDO, 99.92% for ASL, and 95.25% for HG14.
Confirmed stability on HG14 with a result of 93.51% ± 2.31%.
Demonstrated computational efficiency with a processing speed of 26.8 ms per image (~37 FPS).

Resumen

Hand gesture recognition (HGR) is a critical area in computer vision that supports intuitive human–computer interaction and sign language communication, yet existing systems remain sensitive to lighting variations, background clutter, and diverse hand postures. This study introduces two contributions to address these limitations: a Gradient-Based Augmentation Validation (GBAV) framework that establishes structurally safe augmentation ranges before training, and a multi-backbone Convolutional Neural Network (CNN) architecture combining ResNet50 and InceptionV3 with optional attention-based pooling. GBAV uses magnitude-weighted gradient orientation histograms with Pearson correlation and Kullback–Leibler divergence thresholds to verify label invariance under spatial transformations, providing a classifier-agnostic pre-training calibration mechanism. The proposed framework is evaluated on three static gesture datasets, Indonesian Sign Language (BISINDO), American Sign Language (ASL), and Hand Gesture 14 (HG14), yielding validation accuracies of 96.87%, 99.92%, and 95.25%, respectively, with 5-fold cross-validation on HG14 confirming result stability (93.51% ± 2.31%). Quantitative attention localization, cross-dataset transfer evaluation, and computational efficiency analysis (26.8 ms per image, ~37 FPS) further support the framework’s robustness and practical deployability. These findings establish GBAV-calibrated augmentation as the principal performance driver, which complements the multi-backbone architecture for robust hand gesture recognition across diverse visual contexts.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo