Human-to-robot (H2R) handovers are critical in human–robot interaction but are challenged by complex environments that impact robot perception. Traditional RGB-based perception methods exhibit severe performance degradation under harsh lighting (e.g., glare and darkness). Furthermore, H2R handovers occur in unstructured environments populated with fine-grained visual details, such as multi-angle hand configurations and novel object geometries, where conventional semantic segmentation and grasp generation approaches struggle to generalize. To overcome lighting disturbances, we present an H2R handover system with a dual-path perception pipeline. The system fuses perception data from a stereo RGB-D camera (eye-in-hand) and a time-of-flight (ToF) camera (fixed scene) under normal lighting, and switches to the ToF camera for reliable perception under glare and darkness. In parallel, to address the complex spatial and geometric features, we augment the Point Transformer v3 (PTv3) architecture by integrating a T-Net module and a self-attention mechanism to fuse the relative positional angle features between human and robot, enabling efficient real-time 3D semantic segmentation of both the object and the human hand. For grasp generation, we extend GraspNet with a grasp selection module optimized for H2R scenarios. We validate our approach through extensive experiments: (1) a semantic segmentation dataset with 7500 annotated point clouds covering 15 objects and 5 relative angles and tested on 750 point clouds from 15 unseen objects, where our method achieves 84.4% mIoU, outperforming Swin3D-L by 3.26 percentage points with 3.2× faster inference; (2) 250 real-world handover trials comparing our method with the baseline across 5 objects, 5 hand postures, and 5 angles, showing an improvement of 18.4 percentage points in success rate; (3) 450 trials under controlled adverse lighting (darkness and glare), where our dual-path perception method achieves 82.7% overall success, surpassing single-camera baselines by up to 39.4 percentage points; and (4) a comparative experiment against a state-of-the-art multimodal H2R handover method under identical adverse lighting, where our system achieves 75.0% success (15/20) versus the baseline’s 15.0% (3/20), further confirming the lighting robustness of our design. These results demonstrate the system’s robustness and generalization in challenging H2R handover scenarios.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yifei Wang
Southeast University
Baoguo Xu
Huijun Li
Southeast University
Biomimetics
Southeast University
Building similarity graph...
Analyzing shared references across papers
Loading...
Wang et al. (Wed,) studied this question.
synapsesocial.com/papers/69cf5e5f5a333a821460cb91 — DOI: https://doi.org/10.3390/biomimetics11040231