What question did this study set out to answer?

The research aims to develop a cognitively motivated framework for creating empathetic dialogue systems that enhance transparency and personalization.

May 15, 2026Open Access

MOSAIC: A Cognitively Motivated Multi-Agent Framework for Interpretable and Training-Free Empathetic Dialogue

Key Points

The research aims to develop a cognitively motivated framework for creating empathetic dialogue systems that enhance transparency and personalization.
Developed a four-stage cognitive pipeline: affective perception, causal appraisal, episodic memory retrieval, and response synthesis.
Innovated a modular architecture allowing for failure attribution and enhanced interpretability across stages.
Conducted experiments on the EmpatheticDialogues and ESConv datasets to evaluate system performance and empathy scores.
Achieved a 76.4% weighted F1 score and an empathy score of 3.87 on a 1–5 Likert scale.
Demonstrated improved performance in human-rated personalization with a score of 3.67 compared to 3.24 for Claude-3.5 five-shot, effect size d=0.48.
Noted that results should consider data-access asymmetry due to pre-population of the memory store.

Abstract

Empathetic dialogue systems built upon large language models overwhelmingly adopt a monolithic inference paradigm that processes emotion perception, causal reasoning, memory retrieval, and response planning within a single forward pass without architecturally enforced intermediate representations, forfeiting intermediate-state transparency and long-horizon personalization. Drawing on neuroscientific and cognitive–psychological evidence that human empathy is functionally dissociable, we present MOSAIC (Multi-agent Orchestration with Structured Affective memory for Interpretable empathiC dialogue), a training-free framework that operationalizes empathetic dialogue as a four-stage cognitive pipeline: affective perception, causal appraisal, episodic memory retrieval, and response synthesis. Three innovations distinguish MOSAIC from prior work: (1) a cognitively motivated modular architecture whose functionally dissociable stages enable post hoc failure attribution through logged intermediate states; (2) a hierarchical three-tier emotional memory—perceptual, semantic, and episodic—coupled with adaptive three-dimensional retrieval over emotion, situation, and coping-strategy cues; and (3) a heterogeneous model orchestration strategy coordinating open-source and API-accessible models through role-specific chain-of-thought prompts, requiring no task-specific fine-tuning. We note that the EmpatheticDialogues evaluation pre-populates the memory store with 200 training-split episodes prior to test-set interaction, a data-access asymmetry relative to single-model baselines that must be borne in mind when interpreting comparative results. Experiments on EmpatheticDialogues and ESConv show that MOSAIC achieves a 76.4% weighted F1 and an empathy score of 3.87 (on a 1–5 Likert scale) and that it improves over single-model, training-free baselines on aggregate empathy and—most prominently—on human-rated personalization (3.67 vs. 3.24 against Claude-3.5 five-shot, d=0.48). We caution that the comparison against training-free baselines is not data access-controlled (see the cold-start discussion in Methods); the personalization advantage, supported by the ablation without the Event Agent, is the result we treat as the primary practical contribution of this work.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper