Key points are not available for this paper at this time.
Visually impaired users face significant challenges in navigating complex indoor environments due to limited spatial awareness and lack of real-time semantic guidance. This paper proposes a multimodal navigation system integrating environmental perception with vision-language models (VLMs). It provides context-aware and explainable guidance without requiring additional infrastructure. The proposed system combines RTAB-Map for localization, YOLO-World for open-vocabulary object detection, and a lightweight language model for semantic reasoning and natural language interaction. To evaluate our system, experiments are conducted using the RePOPE benchmark to assess hallucination in vision-language understanding. Real-world indoor navigation experiments are also performed. The results show that integrating perception with language-based reasoning improves precision by up to 2.29% and consistently enhances F1-score compared to baseline VLM approaches. Real-world experiments further demonstrate reliable navigation performance, including multi-floor path planning and obstacle-aware guidance. Hence, the proposed system effectively enhances spatial understanding and reduces hallucination, providing a practical and scalable solution for assistive navigation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Huei-Yung Lin
Yunlong Fan
Chin‐Chen Chang
Sensors
National Taipei University of Technology
National United University
Building similarity graph...
Analyzing shared references across papers
Loading...
Lin et al. (Tue,) studied this question.
www.synapsesocial.com/papers/6a056751a550a87e60a1f5ce — DOI: https://doi.org/10.3390/s26103045
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: