Key points are not available for this paper at this time.
Construction robots are a powerful driving force to enable intelligent processes in construction. User-friendly interfaces to support human-robot work collaboration are critical for increasing adoption of robots. Among different interfaces, eye gaze and hand gesture are effective and reliable interaction cues in the noisy construction environment. This paper proposes a novel context-aware method, which integrates eye tracking and gesture recognition for human-robot collaboration in construction. The proposed method employs a two-stream network architecture comprising a first-person view-based stream and a motion sensory data-based stream. The first-person view-based stream models the user's gaze using an attention module to generate an attention map, which helps the stream to focus on the relevant spatiotemporal regions for context extraction. The motion sensory data-based stream is used to process the motion sensory data to extract features related to hand motions. Finally, the extracted vision context and motion features are combined to achieve the gesture recognition for conveying a message between the worker and the robot. This method was tested using a dataset gathered on construction sites. The test results show the proposed method can achieve accuracy and mean class accuracy of 96.8% and 97.7%, illustrating its effectiveness for human-robot collaboration in construction.
Wang et al. (Mon,) studied this question.