What question did this study set out to answer?

The central aim is to develop a multimodal navigation system that aids visually impaired users in navigating indoor environments by integrating perception and language models.

May 14, 2026Open Access

Multimodal Navigation System for Visually Impaired Users Using Environmental Perception and Vision-Language Models

Key Points

The central aim is to develop a multimodal navigation system that aids visually impaired users in navigating indoor environments by integrating perception and language models.
The system utilizes RTAB-Map for localization and YOLO-World for object detection.
A lightweight language model is employed for semantic reasoning and interaction.
Experiments use the RePOPE benchmark and real-world navigation assessments.
The integration of perception and language reasoning improves precision by up to 2.29%.
Enhancements observed in F1-score compared to baseline vision-language model approaches.
Real-world tests indicate reliable navigation, including multi-floor path planning.

Abstract

Visually impaired users face significant challenges in navigating complex indoor environments due to limited spatial awareness and lack of real-time semantic guidance. This paper proposes a multimodal navigation system integrating environmental perception with vision-language models (VLMs). It provides context-aware and explainable guidance without requiring additional infrastructure. The proposed system combines RTAB-Map for localization, YOLO-World for open-vocabulary object detection, and a lightweight language model for semantic reasoning and natural language interaction. To evaluate our system, experiments are conducted using the RePOPE benchmark to assess hallucination in vision-language understanding. Real-world indoor navigation experiments are also performed. The results show that integrating perception with language-based reasoning improves precision by up to 2.29% and consistently enhances F1-score compared to baseline VLM approaches. Real-world experiments further demonstrate reliable navigation performance, including multi-floor path planning and obstacle-aware guidance. Hence, the proposed system effectively enhances spatial understanding and reduces hallucination, providing a practical and scalable solution for assistive navigation.

Read Full Paperexternally

Demander à l'IA

Bookmark

View Full Paper