We present DyNaVLM, an end-to-end vision-language navigation framework using Vision-Language Models (VLM). In contrast to prior methods constrained by fixed angular or distance intervals, our system empowers agents to freely select navigation targets via visual-language reasoning. At its core lies a self-refining graph memory that 1) stores object locations as executable topological relations, 2) enables cross-robot memory sharing through distributed graph updates, and 3) enhances VLM's decision-making via retrieval augmentation. Operating without task-specific training or fine-tuning, DyNaVLM demonstrates high performance on GOAT and ObjectNav benchmarks. Real-world tests further validate its robustness and generalization. The system's three innovations: dynamic action space formulation, collaborative graph memory, and training-free deployment, establish a new paradigm for scalable embodied robot, bridging the gap between discrete VLN tasks and continuous real-world navigation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zehua Ji
Huangxuan Lin
Yue Gao
Building similarity graph...
Analyzing shared references across papers
Loading...
Ji et al. (Wed,) studied this question.
www.synapsesocial.com/papers/68f6379bb481a140a36cf67d — DOI: https://doi.org/10.48550/arxiv.2506.15096
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: