What question did this study set out to answer?

The aim is to investigate the systematic positional bias in large language models' responses to multiple-choice questions.

January 14, 2026Open Access

Positional Bias in Large Language Models: The Persistent Statistical Preference for Middle Answer Choices in Multiple-Choice Question Generation and Answering

Key Points

The aim is to investigate the systematic positional bias in large language models' responses to multiple-choice questions.
Analyzed multiple model families including GPT, Claude, and Llama.
Examined human-authored training data artifacts and transformer positional encoding mechanisms.
Evaluated various mitigation strategies including answer shuffling and fine-tuning on balanced datasets.
Correct answers are disproportionately placed in middle positions like B and C.
Instructing models to randomize answer placement does not eliminate positional bias.
Specific attention heads and MLP layers are identified as encoding positional preferences.

Abstract

This paper investigates a systematic positional bias in large language models (LLMs), where correct answers in multiple-choice questions are disproportionately placed or selected in middle positions (typically options B and C), even when models are explicitly instructed to randomize answer placement. Through a comprehensive technical analysis, the paper examines converging causes of this bias, including human-authored training data artifacts, transformer positional encoding mechanisms, token probability smoothing, and limitations of reinforcement learning from human feedback (RLHF). Empirical observations across multiple model families (GPT, Claude, Llama) are synthesized with recent findings from mechanistic interpretability research that identify specific attention heads and MLP layers responsible for encoding positional preferences. The work further explains why verbal prompt instructions fail to eliminate this behavior and evaluates evidence-based mitigation strategies such as answer shuffling, majority voting over permutations, logit biasing, and fine-tuning on balanced datasets. The paper concludes by discussing implications for benchmark integrity, educational assessment fairness, and LLM-based evaluation systems. This publication is intended as an open-access research preprint for researchers and practitioners working on large language models, evaluation methodology, and AI reliability.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Karim Habib

Actions

Institutions

Helwan University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Positional Bias in Large Language Models: The Persistent Statistical Preference for Middle Answer Choices in Multiple-Choice Question Generation and Answering

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider