Visual Language Navigation (VLN) is a rapidly evolving, cross-disciplinary field that has witnessed a phenomenal rise since its inception in recent past. Leveraging the power of deep learning, it aims to endow auditory and visual perceptions to embodied agents, thereby enabling them to interpret the multi-modal cues. Although the end objective remains solitary, the various ways this problem is approached in the larger context of artificial intelligence differs substantially in scope. Consequently, various contributions, ranging from architectural innovations and novel models, improved training strategies, datasets, methodologies, etc. have been proposed in the literature. Therefore, this work aims to provide a comprehensive review of scientific contributions and advancements by categorizing the literature into four primary abstraction layers: frameworks, models, auxiliaries, and tasks. Within each abstraction layer, the literature is further sub-categorized and then analyzed for notable trends, strengths, weaknesses and limitations as well as cross-compared for respective advantages and limitations. The simulation environments and datasets have been consolidated and unique insights are presented into each along with a summary of their advantages and disadvantages. Through extensive analysis, crucial challenges that remain in VLN are identified, guiding future efforts toward more intelligent and adaptable navigation agents.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jahanzeb Tariq Khan
Nayyer Aafaq
Qasim Ali
National University of Sciences and Technology
Air University
Building similarity graph...
Analyzing shared references across papers
Loading...
Khan et al. (Sat,) studied this question.
www.synapsesocial.com/papers/69c08bcaa48f6b84677f9add — DOI: https://doi.org/10.1007/s10791-026-09977-z
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: