Memory Archive: A Memory-Grounded Training Paradigm for Computer Use Agents This paper introduces the Memory Archive training paradigm, an end-to-end data architecture and training pipeline that addresses the structural failures of standard Computer Use Agent (CUA) training. Currently, most CUA systems rely on behavioural cloning followed by outcome-supervised RL, leading to intent blindness and a severe representational mismatch between training and deployment formats. The central thesis of this paradigm is Format Consistency. The system centers around a compiled task guide called 'memory.md'—a structured document containing step-by-step procedural reasoning, execution commands, and visual state references. This architecture threads this single artifact through four critical stages of the agent lifecycle: Pre-Training (Format Internalization): The base model learns the grammar of GUI actuation events and step-level multimodal alignment. Supervised Fine-Tuning (SFT): The model is trained with retrieved memories in context, treating actuation artifacts ('CommandEvent' JSONs) as first-class training targets alongside reasoning. Post-Training (Memory Adherence RL): Utilizes Group Relative Policy Optimization (GRPO) driven by a novel three-component reward function (Step Alignment, Visual Grounding, and Outcome Consistency) and a VLM-generated Process Reward Model (PRM). Inference-Time Retrieval: A two-stage retrieval stack (Bi-encoder HNSW + Cross-encoder) dynamically pulls relevant memories. The agent tracks execution deviation and autonomously compiles new 'memory.md' files upon task success, endogenously growing its own training corpus. Furthermore, the paradigm introduces a mechanism for in-training evaluation via self-generated memories, allowing researchers to detect overfitting, underfitting, and context-awareness without relying on static external benchmarks. This document provides full mathematical formulations, data construction specifications, algorithm details, and hyperparameter guidance for implementing the architecture.
Kartik A (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: