Los puntos clave no están disponibles para este artículo en este momento.
We present NeSF, a method for producing 3D semantic fields from posed RGB alone. In place of classical 3D representations, our method builds on work in implicit neural scene representations wherein 3D structure is by point-wise functions. We leverage this methodology to recover 3D fields upon which we then train a 3D semantic segmentation model by posed 2D semantic maps. Despite being trained on 2D signals, our method is able to generate 3D-consistent semantic maps from novel poses and can be queried at arbitrary 3D points. Notably, NeSF is with any method producing a density field, and its accuracy improves the quality of the density field improves. Our empirical analysis comparable quality to competitive 2D and 3D semantic segmentation on complex, realistically rendered synthetic scenes. Our method is first to offer truly dense 3D scene segmentations requiring only 2D for training, and does not require any semantic input for inference novel scenes. We encourage the readers to visit the project website.
Vora et al. (Thu,) studied this question.