Nowadays, image-based object recognition and pose estimation are highly active research areas due to their importance in robotic perception and interaction. While modern CNN-based pose estimators achieve great results, they lack transparency regarding the trustworthiness and precision of individual estimates. This lack of certainty inhibits further processing of the results and deters deployment in production environments due to reliability concerns. As an answer, this thesis proposes a fusion-based approach in which, due to a novel output architecture, the CNN self-estimates the amount of information obtained, resulting in individual 6D uncertainty estimates per 6D pose estimate. Specifically, the CNN predicts the observed object points pixel-wise, along with the precision in the image plane of those predictions. All such gathered perspective information is then fused (without linearization) into a single, globally valid 13 × 13-sized information matrix, which is then regressed to yield the six-dimensional result. This separation allows the CNN to operate solely in image space, whereas the conversion from 2D image space to 6D pose is solved analytically. Additionally, the intermediate result of the globally valid information matrix facilitates the fusion with auxiliary information, such as depth, stereo, and prior knowledge, with ease, as it is simply a 13 × 13 matrix addition. With this approach, the pose is regressed from a fusion of all available data, unlike the more ad hoc approach of combining estimates in postprocessing. Also, the CNN call is wholly unaffected by the addition of these supplemental data. An extensive evaluation of the proposed architecture on multiple benchmark datasets showcases meaningful uncertainty estimates while maintaining competitive pose performance. Also, it shows that adding auxiliary information can significantly improve pose performance, but always relative to the amount of new information gained while maintaining the quality of the estimated uncertainty.
Jesse Richter-Klug (Fri,) studied this question.