What question did this study set out to answer?

The central aim is to develop a framework combining gesture recognition and object detection for intuitive human-robot interaction.

April 16, 2026Open Access

Edge Computing Approach to AI-Based Gesture for Human–Robot Interaction and Control

Key Points

The central aim is to develop a framework combining gesture recognition and object detection for intuitive human-robot interaction.
Developed an edge-deployable vision-based system using an RGB camera and xArm robot.
Employed MediaPipe Hands for hand landmark extraction and trajectory analysis.
Used YOLO for task object detection, and ArUco for planar calibration in the robot workspace.
Utilized Kalman filtering and low-pass filtering for stable hand control signals.
Implemented an LSTM classifier for recognizing dynamic gestures.
Filtering techniques significantly reduced hand-tracking jitter.
Gesture recognition delivered stable command states for robot control.
Both NVIDIA Jetson and Raspberry Pi support real-time operation, with Jetson consistently outperforming Raspberry Pi in runtime.

Abstract

This paper presents an edge-deployable vision-based framework for human–robot interaction using a xArm collaborative robot and a single RGB camera mounted on the robot wrist, and lightweight AI-based perception modules. The system enables intuitive, contact-free control by combining hand understanding and object detection within a unified perception–decision–control pipeline. Hand landmarks are extracted using MediaPipe Hands, from which continuous hand trajectories, static gestures, and dynamic gestures are derived. Task objects are detected using a YOLO-based model, and both hand and object observations are mapped into the robot workspace using ArUco-based planar calibration. To ensure stable robot motion, the hand control signal is smoothed using low-pass and Kalman filtering, while dynamic gestures such as waving are recognized using a lightweight LSTM classifier. The complete pipeline runs locally on edge hardware, specifically NVIDIA Jetson Orin Nano and Raspberry Pi 5 with a Hailo AI accelerator. Experimental evaluation includes trajectory stability, gesture recognition reliability, and runtime performance on both platforms. Results show that filtering significantly reduces hand-tracking jitter, gesture recognition provides stable command states for control, and both edge devices support real-time operation, with Jetson achieving consistently lower runtime than Raspberry Pi. The proposed system demonstrates the feasibility of low-cost edge AI solutions for responsive and practical human–robot interaction in collaborative industrial environments.

Edge Computing Approach to AI-Based Gesture for Human–Robot Interaction and Control

Key Points

Abstract

Cite This Study