What type of study is this?

August 26, 2025Open Access

Maynard Smith revisited: A multi-agent reinforcement learning approach to the coevolution of signalling behaviour

Key Points

The analysis shows that multi-agent reinforcement learning does not always result in honest signalling, contrary to previous models.
In most scenarios, optimal outcomes occur without signals indicating resources in the Sir Philip Sidney game.
Using multi-agent reinforcement learning, we simulate interactions to explore learning mechanisms across generations.
These findings highlight the importance of learning models for understanding coevolutionary dynamics in both artificial and biological systems.

Abstract

The coevolution of signalling is a complex problem within animal behaviour, and is also central to communication between artificial agents. The Sir Philip Sidney game was designed to model this dyadic interaction from an evolutionary biology perspective, and was formulated to demonstrate the emergence of honest signalling. We use Multi-Agent Reinforcement Learning (MARL) to show that in the majority of cases, the resulting behaviour adopted by agents is not that shown in the original derivation of the model. This paper demonstrates that MARL can be a powerful tool to study evolutionary dynamics and understand the underlying mechanisms of learning over generations; particularly advantageous is the interpretability of this type of approach, as well as that fact that it allows us to study emergent behaviour without the need to constrain the strategy space from the outset. Although it originally set out to exemplify honest signalling, we show that the game provides no incentive for such behaviour. In the majority of cases, the optimal outcome is one that does not require a signal for the resource to be given. This type of interaction is observed within animal behaviour, and is sometimes denoted proactive prosociality. High learning and low discount rates of the reinforcement learning model are shown to be optimal in order to achieve the outcome that maximises both agents’ reward, and proximity to the given threshold leads to suboptimal learning.

Read Full Paperexternally

AI से पूछें

Bookmark

View Full Paper

Cite This Study

Macmillan-Scott et al. (Tue,) studied this question.

synapsesocial.com/papers/68af620aad7bf08b1eae2fb1 https://doi.org/https://doi.org/10.1371/journal.pcbi.1013302

AI से पूछें

Bookmark

View Full Paper