What type of study is this?

This is a Quantitative Study study.

September 16, 2025

Cost-Efficient Open Vocabulary 3D Scene Understanding Based on Semantic Probability

Key Points

The framework enables 3D semantic segmentation without supervision using semantic probability.
Competitive performance is achieved on benchmark datasets like ScanNet and Matterport3D.
It aligns text features from the CLIP model with pixel features, enhancing 3D scene analysis.
The method effectively reduces computational costs while maintaining accuracy in 3D point cloud processing.

Abstract

Traditional 3D scene understanding methods heavily depend on 3D annotation and training, which allow for the identification of seen classes but struggle to recognize unseen classes. In this paper, we leverage the open vocabulary inference capabilities of pre-trained models, enabling the encoding of open vocabulary concepts. However, unlike existing open vocabulary 3D scene understanding methods, we propose a framework based on semantic probability. This innovation significantly reduces computational cost and is compatible with state-of-the-art two-stage 2D pre-trained models. Specifically, we align the text features from the CLIP model with the pixel features from the 2D pre-trained models, inferring semantic probability of image pixels based on similarity and projecting it onto 3D points. Subsequently, we introduce a point cloud pairs semantic fusion method to merge the point clouds, reducing the semantic probability of erroneous 3D points. Based on probability scores, we achieve 3D semantic segmentation on open vocabularies without any supervision or training. In addition, the semantic probability of 3D points can serve as pseudo-labels for 3D distillation, and the geometric features of the 3D scene can be exploited to improve the segmentation performance. Experimental results demonstrate that the proposed method exhibits competitive performance on publicly available benchmark datasets, including ScanNet, Matterport3D, and nuScenes.

Bookmark

Cite This Study

Shen et al. (Wed,) studied this question.

synapsesocial.com/papers/68d4539c31b076d99fa597c4 https://doi.org/https://doi.org/10.1109/tip.2025.3607643

Bookmark