Students are increasingly using Large Language Models (LLMs) to answer multiple-choice questions (MCQs) without engaging with the underlying concepts, which undermines the validity of traditional assessments. Instead of trying to detect or restrict LLM usage, an alternative strategy is redesigning MCQs so that relying on LLMs becomes ineffective. In this paper, we propose GradQuiz, a framework that generates adversarial distractors targeting the decision-making process of LLMs. GradQuiz leverages gradient-based signals from a target LLM to perturb semantically influential entities in an MCQ. This process generates plausible distractors, which are then refined to ensure grammatical and semantic coherence. We evaluated GradQuiz on two MCQ benchmarks, i.e., OpenTriviaQA and Massive Multitask Language Understanding (MMLU), across multiple LLM families, including both open-weight and proprietary black-box models. Results show that GradQuiz consistently reduces LLM accuracy in answering MCQs. In particular, when applied to gemma-3-27b-it, it reduces accuracy by 67.26% compared to the original quizzes and by 51.07% compared to the strongest competing adversarial distractor approach on the MMLU dataset, while it achieves accuracy reductions of 49.31% and 42.92%, respectively, on the OpenTriviaQA dataset. Human evaluations confirm that the generated distractors preserve pedagogical coherence and relevance, while a controlled study on students shows that GradQuiz does not increase MCQ difficulty for learners who do not rely on LLMs to answer quizzes.
Building similarity graph...
Analyzing shared references across papers
Loading...
Gianluca Bonifazi
Christopher Buratti
Michele Marchetti
ACM Transactions on Intelligent Systems and Technology
University of Modena and Reggio Emilia
Marche Polytechnic University
Building similarity graph...
Analyzing shared references across papers
Loading...
Bonifazi et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69c620be15a0a509bde1948d — DOI: https://doi.org/10.1145/3803797