What type of study is this?

This is a Quantitative Study study.

October 3, 2025Open Access

MSCoRe: A Benchmark for Multi-Stage Collaborative Reasoning in LLM Agents

Key Points

Proposed MSCoRe benchmark enhances evaluation of multi-stage reasoning in language models, filling a critical gap.
The benchmark includes 126696 domain-specific question-answering instances across various sectors, ensuring broad applicability.
Evaluation revealed commercial models performed best, yet notable ROUGE score gaps persist, especially with complex tasks.
Robustness testing indicated LLM performance declines significantly in the presence of noisy data, highlighting a key challenge.

Abstract

Large Language Models (LLMs) have excelled in question-answering (QA) tasks within single domains. However, their reasoning and coordination capabilities in complex, multi-stage scenarios remain underexplored. Existing benchmarks typically focus on isolated tasks or narrow domains, overlooking models' abilities for multi-stage collaboration and optimization without explicit external guidance. To bridge this gap, we propose MSCoRe, a novel benchmark comprising 126696 domain-specific QA instances spanning scenarios in automotive, pharmaceutical, electronics, and energy sectors. The dataset is created using a structured three-phase pipeline: dynamic sampling, iterative question-answer generation, and a multi-level quality assessment to ensure data quality. Tasks are further categorized into three difficulty levels according to stage coverage and complexity. With MSCoRe, we have conducted a comprehensive evaluation of various state-of-the-art LLM agents. The commercial models performed best across all tasks and scenarios, but a notable gap in ROUGE scores remains between simple and complex tasks. We also tested the models' robustness and found that their performance is negatively affected by noisy data. MSCoRe provides a valuable new resource for the community to evaluate and improve multi-stage reasoning in LLM agents. The code and data are available at https: //github. com/D3E0-source/MSCoRE.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yuzhen Lei

Hongbin Xie

Jiaxing Zhao

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

MSCoRe: A Benchmark for Multi-Stage Collaborative Reasoning in LLM Agents

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study