What question did this study set out to answer?

The aim is to improve category-level 6D pose estimation of unseen objects by leveraging structural consistency.

April 7, 2026Open Access

SCP‐Pose: Leveraging Structural Consistency Prior Knowledge for Real‐Time Category‐Level 6D Pose Estimation

Puntos clave

The aim is to improve category-level 6D pose estimation of unseen objects by leveraging structural consistency.
Developed a novel approach using decomposition of objects into components.
Utilized rigid geometric primitives to represent objects' structures.
Employed neural networks for learning spatial relationships and regressing poses from RGB images.
Implemented a normalization strategy to handle multiscale components for accurate pose estimation.
Achieved state-of-the-art performance in single-view 6D pose estimation.
Real-time inference exceeds 400 FPS on the REAL275 dataset.
Demonstrated strong generalization across various object instances with low training costs.

Resumen

Category‐level 6D pose estimation predicts the translation and rotation of unseen object instances from known classes, without relying on Computer‐Aided Design models. The core challenge is intraclass generalization due to unknown instance shapes. We propose a novel approach that addresses this by decomposing objects into constituent components and leveraging the prior knowledge of consistent spatial structure among them. Specifically, we transform the globally complex pose matching into a joint optimization over multiple components. These components are represented by a few rigid geometric primitives (e.g., cylinders, cubes), and estimating their 3D pose from an RGB image involves analyzing the geometric distribution of their 2D projections. We employ neural networks to learn component‐level spatial relationships and directly regress the object's pose and scale from a single RGB image. Key innovations include using simple networks to model structural consistency across components, jointly predicting component‐scale variations, and applying a normalization strategy for multiscale component structures to improve pose accuracy. The method features very low training cost and strong generalization. Evaluation on the REAL275 dataset shows that our approach achieves state‐of‐the‐art performance in single‐view 6D pose estimation, with real‐time inference exceeding 400 FPS.

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo