This paper proposes a deep learning approach for human gesture recognition to support real-time teleoperation of mobile robots in outdoor environments. A lightweight ResNet18 architecture is adapted via transfer learning, combining a personalized dataset captured with a RealSense camera and a reorganized subset of HMDB. A modular pipeline was developed and evaluated under nine experimental configurations, considering different optimizers, datasets, and backbone freezing levels. Results demonstrate that models trained with combined data and adaptive fine-tuning strategies achieve high accuracy and strong generalization under varied lighting and background conditions. The best configuration reached 97.9% test accuracy, reinforcing the potential of CNNs for robust gesture-based human–robot interaction.
Alves et al. (Tue,) studied this question.