February 22, 2024

Zero-shot Object Navigation with Vision-Language Foundation Models Reasoning

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

This research introduces a novel method for zero-shot object navigation, enabling agents to navigate unexplored environments. Our approach differs from traditional methods, which often fail in new settings due to their dependence on large navigation datasets for training. We use Large Vision Language Models (LVLMs) to help agents understand and move through unfamiliar visual environments without prior experience. The process involves using a pretrained LVLM for object detection to create a semantic map, followed by employing LVLM again to predict the likely location of the target object. Our experiments on the RoboTHOR benchmark show improved performance, with a 1.8% increase in both Success Rate and Success Weighted by Path Length (SPL) compared to the existing best method, ESC.

Preguntar a la IA

Me gusta

Guardar