What question did this study set out to answer?

This research aims to improve communication of agent-based models through effective narrative generation using large language models.

April 17, 2026Open Access

Distilling the Complexity of Agent-Based Simulations Into Textual Explanations via Large Language Models

Key Points

This research aims to improve communication of agent-based models through effective narrative generation using large language models.
Developed an automated simulation-to-text method for NetLogo ABMs.
Utilized multimodal large language models and summarization algorithms.
Conducted design-of-experiments over three peer-reviewed ABMs to assess report quality.
Analyzed the effects of different prompting elements on report clarity.
Report quality varies significantly based on summarization algorithm, accounting for up to 34% of variance.
Abstractive summarizers like BART and T5 generated more coherent reports compared to extractive methods.
Claude Opus 4.6 demonstrated the highest robustness among the evaluated large language models.

Abstract

Communicating the design and results of agent-based models (ABMs) to subject matter experts is challenging, which hinders participation and limits trust in simulation-based decision support. Large language models (LLMs) can communicate ABMs as textual summaries, thus complementing traditional disclosure through statistical and visualization techniques. While prior work translated the structure of conceptual models into narratives via LLMs, our extension covers the dynamics of simulation models via an automated simulation-to-text method that extracts contextual information from NetLogo ABMs, performs repeated simulations, and generates narrative descriptions (including the model’s purpose, parameters, and simulation dynamics) using mutimodal LLMs. Furthermore, four summarization algorithms spanning abstractive and extractive methods provide shorter reports. Using Design-of-Experiments methods over three peer-reviewed ABMs, state-of-the-art multimodal LLMs from 2026 (Gemini 3.1 Pro, Qwen 3.5, Kimi K2.5, Claude Opus 4.6) and different prompt elements (e.g., roles, examples, generating insights, statistical analyses), we compare our results with several reference reports (e.g., from associate professors). We find that report quality is determined mainly (i.e., up to 34% of the variance) by the summarization algorithm and its interaction with the LLM, with abstractive summarizers (BART, T5) producing more coherent and readable reports, while Claude Opus 4.6 is the most robust LLM.

Distilling the Complexity of Agent-Based Simulations Into Textual Explanations via Large Language Models

Key Points

Abstract

Cite This Study