6D object pose estimation is an important component of robotic bin picking in cluttered space for autonomous manufacturing, with various methods being researched to improve its accuracy. However, single-depth camera systems struggle to achieve the millimeter-level precision required in manufacturing, and objects with reflective surfaces and low features further complicate pose estimation. This study proposes a technique that improves the accuracy of pose estimation at graspable objects by integrating and analyzing multi-view point clouds obtained from a stationary depth camera and a mobile RGB-D camera mounted on an end-effector of a manipulator. After retrieving pointcloud of the mostly graspable object by YOLOv8, multiple shots are taken as planned using the manipulator, then the pointclouds are merged and filtered using Random Sample Consensus (RANSAC) and Farthest Point Sampling (FPS) algorithm. The reconstructed pointcloud is compared to the reference to estimate the pose of the object. Experimental results demonstrate that the approach achieves a positional error of less than 2mm along the z-axis and an angular error below 2 degrees for reflective and textureless objects by utilizing twelve multi-view data than using single camera. This research presents a cost-effective and accurate 6D object pose estimation solution, highlighting its potential to contribute to high-precision automation systems in manufacturing.
Lee et al. (Thu,) studied this question.