What question did this study set out to answer?

The research aims to improve memory access mechanisms in vision-and-language navigation through imaginative retrieval processes.

April 3, 2026

Dream to Recall: Imagination-Guided Experience Retrieval for Memory-Persistent Vision-and-Language Navigation

Key Points

The research aims to improve memory access mechanisms in vision-and-language navigation through imaginative retrieval processes.
Developed Memoir, a model utilizing imagination for memory retrieval in navigation tasks.
Created a language-conditioned world model to both store experiences and generate retrieval queries.
Implemented Hybrid Viewpoint-Level Memory to link observations and behavioral patterns.
Evaluated Memoir on various memory-persistent VLN benchmarks with 10 distinct scenarios.
Achieved 5.4% gains in SPL on IR2R compared to the best baseline.
Demonstrated an 8.3× training speedup over previous methods.
Showed a 74% reduction in inference memory usage.
Analysis revealed significant potential for further improvement in the navigation model.

Abstract

Vision-and-Language Navigation (VLN) requires agents to follow natural language instructions through environments, with memory-persistent variants demanding progressive improvement through accumulated experience. Existing approaches for memory-persistent VLN face critical limitations: they lack effective memory access mechanisms, instead relying on entire memory incorporation or fixed-horizon lookup, and predominantly store only environmental observations while neglecting navigation behavioral patterns that encode valuable decision-making strategies. We present Memoir, which employs imagination as a retrieval mechanism grounded by explicit memory: a world model imagines future navigation states as queries to selectively retrieve relevant environmental observations and behavioral histories. The approach comprises: 1) a language-conditioned world model that imagines future states serving dual purposes: encoding experiences for storage and generating retrieval queries; 2) Hybrid Viewpoint-Level Memory that anchors both observations and behavioral patterns to viewpoints, enabling hybrid retrieval; and 3) an experience-augmented navigation model that integrates retrieved knowledge through specialized encoders. Extensive evaluation across diverse memory-persistent VLN benchmarks with 10 distinct testing scenarios demonstrates Memoir's effectiveness: significant improvements across all scenarios, with 5.4% SPL gains on IR2R over the best memory-persistent baseline, accompanied by 8.3× training speedup and 74% inference memory reduction. The results validate that predictive retrieval of both environmental and behavioral memories enables more effective navigation, with analysis indicating substantial headroom (73.3% vs 93.4% upper bound) for this imagination-guided paradigm.

AIに質問

Bookmark

AIに質問

Bookmark

Dream to Recall: Imagination-Guided Experience Retrieval for Memory-Persistent Vision-and-Language Navigation

Key Points

Abstract

Cite This Study