What type of study is this?

This is a Quantitative Study study.

September 29, 2025Open Access

MTR-Bench: A Comprehensive Benchmark for Multi-Turn Reasoning Evaluation

Key Points

MTR-Bench introduces a comprehensive framework for evaluating multi-turn reasoning in LLMs, which is often overlooked.
The benchmark includes 3600 instances across 40 tasks, providing detailed insight into reasoning capabilities.
Experiments show that even advanced models struggle with interactive, multi-turn reasoning tasks, indicating a research gap.
The automated evaluation system allows for scalable assessments without the need for human intervention.

Abstract

Recent advances in Large Language Models (LLMs) have shown promising results in complex reasoning tasks. However, current evaluations predominantly focus on single-turn reasoning scenarios, leaving interactive tasks largely unexplored. We attribute it to the absence of comprehensive datasets and scalable automatic evaluation protocols. To fill these gaps, we present MTR-Bench for LLMs' Multi-Turn Reasoning evaluation. Comprising 4 classes, 40 tasks, and 3600 instances, MTR-Bench covers diverse reasoning capabilities, fine-grained difficulty granularity, and necessitates multi-turn interactions with the environments. Moreover, MTR-Bench features fully-automated framework spanning both dataset constructions and model evaluations, which enables scalable assessment without human interventions. Extensive experiments reveal that even the cutting-edge reasoning models fall short of multi-turn, interactive reasoning tasks. And the further analysis upon these results brings valuable insights for future research in interactive AI systems.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Xiaoyuan Li

Keqin Bao

Yubo Ma

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

MTR-Bench: A Comprehensive Benchmark for Multi-Turn Reasoning Evaluation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider