3D Gaussian Splatting has recently emerged as a powerful representation for photorealistic rendering and reconstruction of complex scenes. However, its practical applications in augmented/virtual reality, digital‐twin, and robotics demand accurate and structurally consistent meaningful 3D segmentation, which remains a significant challenge. Existing 3D segmentation approaches, predominantly based on multiview 2D images, frequently rely on appearance‐driven criteria, resulting in semantic misclassification—either incorrectly merging distinct object parts or excessively fragmenting coherent regions. Moreover, these methods significantly struggle with objects with multiple components and occluded scenes. To address these limitations, we propose an interactive human‐in‐the‐loop segmentation framework that combines a fast optimization‐based 3D segmentation algorithm with intuitive finger‐based user interactions within a virtual reality environment. Our optimization‐based segmentation module runs within a few seconds (tens of times faster than existing learning‐based methods) providing users with real‐time visual updates on current segmentation results, enabling them to refine outputs interactively by adjusting prompts and viewpoints in a human‐in‐the‐loop manner. Our finger‐based interface system allows precise 3D spatial prompting, enabling accurate and multiview consistent prompts, thereby overcoming the limitations of traditional 2D multiview prompts and segmentation. This combination significantly improves segmentation accuracy, semantic consistency, and robustness to occlusion and multipart structures, as demonstrated by experimental results showing fine‐grained subpart segmentation in cluttered scenes.
Lee et al. (Mon,) studied this question.