What question did this study set out to answer?

The aim is to explore the advancements in open-vocabulary semantic mapping for mobile robots using foundation models to enhance human-robot interaction.

June 26, 2026Open Access

A Review of Open-Vocabulary Semantic Mapping and Navigation with Foundation Models for Mobile Robots

Puntos clave

The aim is to explore the advancements in open-vocabulary semantic mapping for mobile robots using foundation models to enhance human-robot interaction.
Reviewed advancements in semantic mapping and navigation using foundation models.
Categorized the developments into four areas: 3D metric maps, open-vocabulary representations, scene graphs, and generative language fields.
Analyzed the historical context and introduced various datasets, simulators, and evaluation metrics.
Identified seven open challenges in achieving robust open-vocabulary semantic mapping.
Highlighted the potential of LLMs and VLMs to improve interactions in dynamic environments.
Demonstrated the significance of integrating language-derived features with spatial representations.

Resumen

Autonomous mobile robots that coexist with humans must construct not only geometric maps but also semantic maps that can be accessed through natural language. Conventional semantic mapping has mainly focused on assigning labels from predefined closed vocabularies to metric maps, limiting its ability to handle novel objects, open-ended linguistic expressions, and flexible human-robot interaction. Recent advances in large-scale foundation models, particularly LLMs and VLMs, have accelerated research on open-vocabulary sematic mapping. In parallel, generative 3D representations such as neural radiance fields and 3D Gaussian splatting have enabled dense, continuous spatial representations associated with language-derived features. Together, these developments allow robots to acquire spatial semantic representations that connect perception, language, and action. This paper reviews this rapidly evolving field through a four-part taxonomy: (i) fusion of semantic features into 3D metric maps; (ii) object-centric open-vocabulary representations; (iii) hierarchical scene graph representations; and (iv) continuous generative 3D language fields. We also revisit the history of open-vocabulary semantic mapping and provide an overview of foundation model-based navigation using language-accessible maps, ranging from object-goal navigation to LLM-based hierarchical task planning. Finally, we introduce evaluation datasets, simulators, robot platforms, and evaluation metrics, and summarize seven open challenges.

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo