This paper investigates a systematic positional bias in large language models (LLMs), where correct answers in multiple-choice questions are disproportionately placed or selected in middle positions (typically options B and C), even when models are explicitly instructed to randomize answer placement. Through a comprehensive technical analysis, the paper examines converging causes of this bias, including human-authored training data artifacts, transformer positional encoding mechanisms, token probability smoothing, and limitations of reinforcement learning from human feedback (RLHF). Empirical observations across multiple model families (GPT, Claude, Llama) are synthesized with recent findings from mechanistic interpretability research that identify specific attention heads and MLP layers responsible for encoding positional preferences. The work further explains why verbal prompt instructions fail to eliminate this behavior and evaluates evidence-based mitigation strategies such as answer shuffling, majority voting over permutations, logit biasing, and fine-tuning on balanced datasets. The paper concludes by discussing implications for benchmark integrity, educational assessment fairness, and LLM-based evaluation systems. This publication is intended as an open-access research preprint for researchers and practitioners working on large language models, evaluation methodology, and AI reliability.
Building similarity graph...
Analyzing shared references across papers
Loading...
Karim Habib
Helwan University
Building similarity graph...
Analyzing shared references across papers
Loading...
Karim Habib (Mon,) studied this question.
www.synapsesocial.com/papers/696718e287ba607552bb8e0c — DOI: https://doi.org/10.5281/zenodo.18226415
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: