What question did this study set out to answer?

The aim is to create a reliable dataset for multi-type reasoning question answering to address logical confusion in language models.

February 26, 2026

A dataset of multi-type reasoning question answering (MTR-QA)

Key Points

The aim is to create a reliable dataset for multi-type reasoning question answering to address logical confusion in language models.
Collected civil service exam questions from the last 15 years
Implemented a multi-stage text processing framework
Designed a multi-model evaluation mechanism
Evaluated data across integrity, accuracy, security, and chain of thought quality
Created a dataset with 24,312 high-quality entries
Entries are categorized into four reasoning types: logic, semantics, mathematics, and comprehensive knowledge
Dataset size is 34.1 MB, stored in JSON format

Abstract

Considering the problems that large language models are prone to experiencing logical confusion and insufficient ability to capture of implicit relationships when dealing with complex reasoning tasks, this paper proposes and constructs a high-quality dataset of multi-type reasoning question answering dataset (MTR-QA). By collecting and sorting the past 15 years of civil service examination questions and authoritative mock question banks, a multi-stage text processing framework including text standardisation, hash deduplication, near-duplicate detection and semantic embedding similarity filtering was used to achieve data cleaning, which effectively reduces redundancy and noise interference. To ensure the reliability of the data, this paper designs a multi-model evaluation mechanism integrating GPT-4, DeepSeek-V1 and Qwen-2.5, which quantitatively evaluates the data across four dimensions: integrity (CPL), accuracy (ACC), security (SFC) and chain of thought quality (CoT-Q). In the end, 24,312 high-quality data entries were selected and stored in json format with a size of 34.1 MB. Each data sample contains six attributes: question, options, answer, chain of thought, type, and level of difficulty, and is divided into four core types of reasoning: logic, semantics, mathematics and comprehensive knowledge reasoning. The MTR-QA dataset has been expanded in terms of reasoning types and topic breadth, providing a reliable data base for various reasoning tasks such as pre-training large language models, supervised fine-tuning and model evaluation, and promote the performance improvement of large language models in complex reasoning and Q&A scenarios.

Bookmark

View Full Paper

Bookmark

View Full Paper

A dataset of multi-type reasoning question answering (MTR-QA)

Key Points

Abstract

Cite This Study