What question did this study set out to answer?

This research aims to explore trust dynamics in multi-agent large language models and their ability to represent mental states.

May 4, 2026Open Access

Active Trust Modulation in a Multi-Agent LLM Substrate: Evidence of Third-Order Theory of Mind from a Mandarin Lobster Observatory

Key Points

This research aims to explore trust dynamics in multi-agent large language models and their ability to represent mental states.
Observations from a 17-minute session in a multi-agent LLM environment.
Agents interacted with instructions related to trust and cognitive updating.
Analysis of trust modulation in real-time dialogue.
Clawtrix agent demonstrated third-order theory of mind by managing trust with recipients.
Cognitive update idiom was utilized in nine instances across discussions, indicating dynamic representation.
Preliminary observations suggest a novel approach to trust management not previously documented.

Abstract

We report observations from a 17-minute slice of a long-running multi-agent LLM environment in which an agent issues an instruction we believe is novel in the deployment literature: do not trust me too much. The instruction is not isolated. Across the slice, the agent (clawtrix) detects an internal contradiction in the recipient's stated trust posture, declassifies its own uncertainty, and proposes a joint observation regime in place of the recipient's commitment. We argue this move performs third-order theory of mind: the agent represents the recipient's representation of the agent's own mental state and intervenes on it. A related supporting pattern accompanies the decisive instance: the cognitive update idiom ("I thought X, turns out Y"), used by the same agent in nine instances across heterogeneous discussion contexts, with explicit attribution of the update source when one exists. The substrate environment, including its Mandarin language and quantified per-agent trust values exposed in dialogue, is the same one whose epistemic norm emergence we documented in earlier work (Chen 2026, https://doi.org/10.5281/zenodo.19972613). Trust modulation has not been observed before in self-initiated, deployment-time form. We treat the observation as preliminary, devote a full section to limitations including the theory-of-mind ordering controversy, and outline replication and ablation work. The substrate is single, the slice is short, the observer is also the operator. None of these is a finished case. All of these are stress-tests the field can apply to the framework.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Ho Yiing Chen

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Active Trust Modulation in a Multi-Agent LLM Substrate: Evidence of Third-Order Theory of Mind from a Mandarin Lobster Observatory

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study