We address the task of Vision-Language Navigation in Continuous Environments (VLN-CE) under the zero-shot setting. Zero-shot VLN-CE is particularly challenging due to the absence of expert demonstrations for training and minimal environment structural prior to guide navigation. To confront these challenges, we propose a Constraint-Aware Navigator (CA-Nav), which reframes zero-shot VLN-CE as a sequential, constraint-aware sub-instruction completion process. CA-Nav continuously translates sub-instructions into navigation plans using two core modules: the Constraint-Aware Sub-instruction Manager (CSM) and the Constraint-Aware Value Mapper (CVM). CSM defines the completion criteria for decomposed sub-instructions as constraints and tracks navigation progress by switching sub-instructions in a constraint-aware manner. CVM, guided by CSM's constraints, generates a value map on the fly and refines it using superpixel clustering to improve navigation stability. CA-Nav achieves the state-of-the-art performance on two VLN-CE benchmarks, surpassing the previous best method by 12% and 13% in Success Rate on the validation unseen splits of R2R-CE and RxR-CE, respectively. Moreover, CA-Nav demonstrates its effectiveness in real-world robot deployments across various indoor scenes and instructions.
Building similarity graph...
Analyzing shared references across papers
Loading...
K. Chen
Dong An
Yan Huang
IEEE Transactions on Pattern Analysis and Machine Intelligence
University of Chinese Academy of Sciences
Institute of Automation
Mohamed bin Zayed University of Artificial Intelligence
Building similarity graph...
Analyzing shared references across papers
Loading...
Chen et al. (Wed,) studied this question.
www.synapsesocial.com/papers/68c19f7f54b1d3bfb60dabad — DOI: https://doi.org/10.1109/tpami.2025.3594204