Empathetic dialogue systems built upon large language models overwhelmingly adopt a monolithic inference paradigm that processes emotion perception, causal reasoning, memory retrieval, and response planning within a single forward pass without architecturally enforced intermediate representations, forfeiting intermediate-state transparency and long-horizon personalization. Drawing on neuroscientific and cognitive–psychological evidence that human empathy is functionally dissociable, we present MOSAIC (Multi-agent Orchestration with Structured Affective memory for Interpretable empathiC dialogue), a training-free framework that operationalizes empathetic dialogue as a four-stage cognitive pipeline: affective perception, causal appraisal, episodic memory retrieval, and response synthesis. Three innovations distinguish MOSAIC from prior work: (1) a cognitively motivated modular architecture whose functionally dissociable stages enable post hoc failure attribution through logged intermediate states; (2) a hierarchical three-tier emotional memory—perceptual, semantic, and episodic—coupled with adaptive three-dimensional retrieval over emotion, situation, and coping-strategy cues; and (3) a heterogeneous model orchestration strategy coordinating open-source and API-accessible models through role-specific chain-of-thought prompts, requiring no task-specific fine-tuning. We note that the EmpatheticDialogues evaluation pre-populates the memory store with 200 training-split episodes prior to test-set interaction, a data-access asymmetry relative to single-model baselines that must be borne in mind when interpreting comparative results. Experiments on EmpatheticDialogues and ESConv show that MOSAIC achieves a 76.4% weighted F1 and an empathy score of 3.87 (on a 1–5 Likert scale) and that it improves over single-model, training-free baselines on aggregate empathy and—most prominently—on human-rated personalization (3.67 vs. 3.24 against Claude-3.5 five-shot, d=0.48). We caution that the comparison against training-free baselines is not data access-controlled (see the cold-start discussion in Methods); the personalization advantage, supported by the ablation without the Event Agent, is the result we treat as the primary practical contribution of this work.
Liu et al. (Wed,) studied this question.