Hand gesture recognition (HGR) is a critical area in computer vision that supports intuitive human–computer interaction and sign language communication, yet existing systems remain sensitive to lighting variations, background clutter, and diverse hand postures. This study introduces two contributions to address these limitations: a Gradient-Based Augmentation Validation (GBAV) framework that establishes structurally safe augmentation ranges before training, and a multi-backbone Convolutional Neural Network (CNN) architecture combining ResNet50 and InceptionV3 with optional attention-based pooling. GBAV uses magnitude-weighted gradient orientation histograms with Pearson correlation and Kullback–Leibler divergence thresholds to verify label invariance under spatial transformations, providing a classifier-agnostic pre-training calibration mechanism. The proposed framework is evaluated on three static gesture datasets, Indonesian Sign Language (BISINDO), American Sign Language (ASL), and Hand Gesture 14 (HG14), yielding validation accuracies of 96.87%, 99.92%, and 95.25%, respectively, with 5-fold cross-validation on HG14 confirming result stability (93.51% ± 2.31%). Quantitative attention localization, cross-dataset transfer evaluation, and computational efficiency analysis (26.8 ms per image, ~37 FPS) further support the framework’s robustness and practical deployability. These findings establish GBAV-calibrated augmentation as the principal performance driver, which complements the multi-backbone architecture for robust hand gesture recognition across diverse visual contexts.
Chen et al. (Wed,) studied this question.