Target-oriented grasping has become increasingly important in household and industrial environments, and deploying such systems on mobile robots is particularly challenging due to limited computational resources. To address these limitations, we present an efficient framework for real-time target-oriented grasping on resource-constrained platforms, supporting both click-based grasping for unknown objects and category-based grasping for known objects. To reduce model complexity while maintaining detection accuracy, YOLOv8 is compressed using a structured pruning method. For grasp pose generation, a pretrained GR-ConvNetv2 predicts candidate grasps, which are restricted to the target object using masks generated by MobileSAMv2. A geometry-based correction module then adjusts the position, angle, and width of the initial grasp poses to improve grasp accuracy. Finally, extensive experiments were carried out on the Cornell and Jacquard datasets, as well as in real-world single-object, cluttered, and stacked scenarios. The proposed framework achieves grasp success rates of 98.8% on the Cornell dataset and 95.8% on the Jacquard dataset, with over 90% success in real-world single-object and cluttered settings, while maintaining real-time performance of 67 ms and 75 ms per frame in the click-based and category-specified modes, respectively. These experiments demonstrate that the proposed framework achieves high grasping accuracy and robust performance, with a efficient design that enables deployment on mobile and resource-constrained robots.
Han et al. (Sun,) studied this question.