What question did this study set out to answer?

This evaluation compares multi-agent coordination strategies with retrieval-augmented generation in language models.

December 11, 2025Open Access

Multi-Agent Coordination Strategies vs Retrieval-Augmented Generation in LLMs: A Comparative Evaluation

Key Points

This evaluation compares multi-agent coordination strategies with retrieval-augmented generation in language models.
Tested four coordination strategies: collaborative, sequential, competitive, hierarchical.
Evaluated on Mistral 7B, Llama 3.1 8B, and Granite 3.2 8B using 100 question-answer pairs.
Assessed performance with Composite Performance Score and Threshold-aware CPS across nine metrics.
All multi-agent configurations showed degradation in performance compared to single-agent baselines, ranging from −4.4% to −35.3%.
Llama 3.1 8B tolerated sequential and hierarchical coordination with minimal degradation (−4.9% to −5.3%).
Granite 3.2 8B exhibited 14–35% degradation across all strategies.

Abstract

This paper evaluates multi-agent coordination strategies against single-agent retrieval-augmented generation (RAG) for open-source language models. Four coordination strategies (collaborative, sequential, competitive, hierarchical) were tested across Mistral 7B, Llama 3.1 8B, and Granite 3.2 8B using 100 domain-specific question–answer pairs (3100 total evaluations). Performance was assessed using Composite Performance Score (CPS) and Threshold-aware CPS (T-CPS), aggregating nine metrics spanning lexical, semantic, and linguistic dimensions. Under the tested conditions, all 28 multi-agent configurations showed degradation relative to single-agent baselines, ranging from −4.4% to −35.3%. Coordination overhead was identified as a primary contributing factor. Llama 3.1 8B tolerated Sequential and Hierarchical coordination with minimal degradation (−4.9% to −5.3%). Mistral 7B with shared context retrieval achieved comparable results. Granite 3.2 8B showed degradation of 14–35% across all strategies. Collaborative coordination exhibited the largest degradation across all models. Study limitations include evaluation on a single domain (agriculture), use of 7–8B parameter models, and homogeneous agent architectures. These findings suggest that single-agent RAG may be preferable for factual question-answering tasks in local deployment scenarios with computational constraints. Future research should explore larger models, heterogeneous agent teams, role-specific prompting, and advanced consensus mechanisms.

Multi-Agent Coordination Strategies vs Retrieval-Augmented Generation in LLMs: A Comparative Evaluation

Key Points

Abstract

Cite This Study

Also Consider

Also Consider