Category‐level 6D pose estimation predicts the translation and rotation of unseen object instances from known classes, without relying on Computer‐Aided Design models. The core challenge is intraclass generalization due to unknown instance shapes. We propose a novel approach that addresses this by decomposing objects into constituent components and leveraging the prior knowledge of consistent spatial structure among them. Specifically, we transform the globally complex pose matching into a joint optimization over multiple components. These components are represented by a few rigid geometric primitives (e.g., cylinders, cubes), and estimating their 3D pose from an RGB image involves analyzing the geometric distribution of their 2D projections. We employ neural networks to learn component‐level spatial relationships and directly regress the object's pose and scale from a single RGB image. Key innovations include using simple networks to model structural consistency across components, jointly predicting component‐scale variations, and applying a normalization strategy for multiscale component structures to improve pose accuracy. The method features very low training cost and strong generalization. Evaluation on the REAL275 dataset shows that our approach achieves state‐of‐the‐art performance in single‐view 6D pose estimation, with real‐time inference exceeding 400 FPS.
Building similarity graph...
Analyzing shared references across papers
Loading...
Xuyang Li
Mingxuan Yu
Xuemei Xie
Advanced Intelligent Systems
Xidian University
Peng Cheng Laboratory
Xi’an University of Posts and Telecommunications
Building similarity graph...
Analyzing shared references across papers
Loading...
Li et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69d49f44b33cc4c35a227c68 — DOI: https://doi.org/10.1002/aisy.202501321