Large language models face challenges in long-context question answering, where key evidence of a query may be dispersed across millions of tokens. Existing works equip large language models with a memory corpus that is dynamically updated during a single-pass document scan, also known as the "memorize while reading" methods. While this approach scales efficiently, it suffers from irreversible forward-only processing, information loss through overwriting, and sparse reinforcement learning signals. To tackle these challenges, we present ReMemR1, a memory-augmented agent with callback-enhanced memory that allows selective retrieval from the entire memory history and allows non-linear reasoning and revisiting of early evidence. To further strengthen training, we propose Reinforcement Learning with Multi-Level Rewards (RLMLR), which combines final-answer rewards with dense, step-level signals that guide effective memory use. Together, these contributions mitigate information degradation, improve supervision, and support multi-hop memory utilizing. Experiments on long-document QA show significant gains over existing memory-based approaches, which validates ReMemR1 as an effective solution for long-context reasoning agents.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yaorui Shi
Yuxin Chen
Siyuan Wang
Building similarity graph...
Analyzing shared references across papers
Loading...
Shi et al. (Sat,) studied this question.
www.synapsesocial.com/papers/68f6196ee0bbbc94fac36480 — DOI: https://doi.org/10.48550/arxiv.2509.23040