Industrial grasping requires accurate pose estimation and identity-aware selection, yet most deep-learning grasp detectors are object-agnostic and computationally heavy, so most existing approaches limit goal-directed manipulation and deployment on lightweight embedded systems. This paper presents a robot grasping system that combines RGB-based grasp detection and depth-based 3D localization with low-cost robot control. We use YOLOv11 with Oriented Bounding Boxes (YOLOv11-OBB) to simultaneously predict object pose and classification from RGB images. These detections are combined with depth data from an Intel RealSense D435 RGB-D camera to compute a 3D grasping pose. A 4-DOF robot arm controlled via a PLC performs pick-and-place operations based on the estimated poses. The paper evaluates two scenarios: a grasping-only model trained on a combination of Cornell and custom real-world datasets, and a grasping and classification model that allows for the selective manipulation of multiple object types. Experimental results show that the grasp-only model achieves 99.5% mAP@0.5, 94.0% mAP@0.5:0.95, and 99.4% precision at an IoU threshold of 0.6, while maintaining an inference time of 29 ms under the tested hardware setting. Compared with several representative grasp detection methods, the proposed approach achieves competitive accuracy and real-time performance. The grasp+classification model achieves over 97% grasp success across various object types with only 619 training images, indicating good performance under the tested experimental conditions despite the limited dataset size.
Vo et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: