What question did this study set out to answer?

The aim is to improve multiple-choice question answering (MCQA) accuracy using a structured elimination framework.

March 21, 2026Open Access

Structured elimination with self-consistency and verification for robust multiple-choice reasoning: a large-scale sports training benchmark and cross-domain evaluation

Key Points

The aim is to improve multiple-choice question answering (MCQA) accuracy using a structured elimination framework.
Developed a model-agnostic structured elimination framework integrating verification and self-consistency.
Used LLaMA-3 as the backbone for multi-round option elimination.
Implemented lightweight evidence retrieval for verification.
Created SportsMCQ-5k, a benchmark with 5,000 sports-related MCQA questions, and evaluated on it.
Achieved a consistent accuracy improvement of 4–7 points over strong 7B–9B open-source baselines.
Confirmed contributions of verification and self-consistency through ablation studies.

Abstract

Large language models (LLMs) excel at many QA tasks but still struggle with multiple-choice question answering (MCQA), especially under strong distractors. Humans often solve such questions by eliminating implausible options and verifying the remaining candidates. We propose a model-agnostic structured elimination framework that unifies stepwise elimination, answer verification, and self-consistency. Instantiated with LLaMA-3 (8B) as the primary backbone, the model performs multi-round option elimination, optionally verifies eliminations via internal checks or lightweight evidence retrieval (e.g., Wikipedia), and aggregates multiple sampled elimination chains for robust decisions. We introduce SportsMCQ-5k, a 5,000-question sports training MCQA benchmark, and evaluate on it alongside CommonsenseQA, Social IQa, and MedMCQA. Across datasets, our method consistently improves accuracy over strong 7B–9B open-source baselines by 4–7 points, while ablations confirm the contributions of verification and self-consistency. The proposed framework enhances robustness and interpretability for educational assessment, including sports training and other discipline-specific testing.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Ye Teng

Cheng Wang

Journals

Journal of King Saud University - Computer and Information Sciences

Actions

Institutions

East China University of Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Structured elimination with self-consistency and verification for robust multiple-choice reasoning: a large-scale sports training benchmark and cross-domain evaluation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider