What question did this study set out to answer?

This research aims to enhance point cloud scene flow estimation by addressing challenges in feature matching and ambiguous correspondences.

May 21, 2026Open Access

SetConv++: Point Cloud Scene Flow Estimation Constrained by Feature Self-Supervision

Key Points

This research aims to enhance point cloud scene flow estimation by addressing challenges in feature matching and ambiguous correspondences.
Introduced a novel method combining discriminative feature learning and probabilistic flow refinement.
Developed the SetConv++ architecture for better point feature representation.
Implemented a refinement module using random walk for adjusting flow estimates.
Reduced Endpoint Error (EPE) by 13.6% from 0.0411 to 0.0355.
Improved Accuracy Strict (AS) by 2.43 percentage points from 92.68% to 95.11%.
Decreased outlier rate by 1.5 percentage points.

Abstract

Point cloud scene flow estimation aims to capture the three-dimensional motion of each point in a sequence of point clouds. Although progress has occurred in this field, existing methods often face significant challenges. In particular, two key issues persist: the absence of corresponding local information from the source point cloud to the target, preventing correct feature matching, and the presence of highly similar adjacent structures in target regions, which leads to ambiguous correspondences due to indistinguishable point features. To address these problems, this paper introduces a novel self-supervised method for point cloud scene flow estimation. Theoretically, we establish a new framework that integrates discriminative feature learning with probabilistic flow refinement. A new network architecture, SetConv++, is designed to learn more discriminative point feature representations, enhancing differentiation in similar structures. Additionally, a refinement module uses the random walk algorithm to adjust initial flow estimates. This approach reconstructs low-confidence flows with high-confidence surrounding ones, reducing missing correspondence issues. Crucially, a new flow smoothing loss term ensures local consistency while suppressing error propagation—a fundamental limitation in existing methods. Through comprehensive experiments on the KITTI Scene Flow dataset, our method demonstrates superior performance. It significantly outperforms existing self-supervised approaches across multiple standard evaluation metrics. Specifically, on the KITTI Scene Flow dataset, our method reduces the Endpoint Error (EPE) by 13.6% (from 0.0411 to 0.0355) and improves Accuracy Strict (AS) by 2.43 percentage points (from 92.68% to 95.11%) compared to baseline self-supervised approaches, while also reducing the outlier rate (Out) by 1.5 percentage points. This advancement not only provides a robust theoretical framework for handling ambiguous correspondences but also enables more reliable and efficient downstream applications—such as autonomous driving perception systems requiring real-time motion accuracy in complex scenes.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper