What question did this study set out to answer?

The aim is to create a framework that generates adversarial distractors to counter reliance on Large Language Models (LLMs) in multiple-choice questions.

March 27, 2026

Leveraging Adversarial Attacks to Generate Multiple-Choice Quizzes Robust Against Large Language Models

Key Points

The aim is to create a framework that generates adversarial distractors to counter reliance on Large Language Models (LLMs) in multiple-choice questions.
Developed GradQuiz framework to generate adversarial distractors.
Utilized gradient-based signals from target LLMs for semantic perturbations.
Conducted evaluations on benchmarks, OpenTriviaQA and MMLU, across various LLMs.
Human evaluations assessed pedagogical relevance and coherence of distractors.
Controlled study explored impact on student performance without LLM use.
GradQuiz reduced LLM accuracy by 67.26% on gemma-3-27b-it compared to original MCQs.
Comparison against existing methods showed accuracy reductions of 51.07% on MMLU.
On OpenTriviaQA, accuracy reductions were 49.31% and 42.92% respectively.
Human evaluations confirmed distractor relevance and coherence.
GradQuiz did not increase quiz difficulty for non-LLM users.

Abstract

Students are increasingly using Large Language Models (LLMs) to answer multiple-choice questions (MCQs) without engaging with the underlying concepts, which undermines the validity of traditional assessments. Instead of trying to detect or restrict LLM usage, an alternative strategy is redesigning MCQs so that relying on LLMs becomes ineffective. In this paper, we propose GradQuiz, a framework that generates adversarial distractors targeting the decision-making process of LLMs. GradQuiz leverages gradient-based signals from a target LLM to perturb semantically influential entities in an MCQ. This process generates plausible distractors, which are then refined to ensure grammatical and semantic coherence. We evaluated GradQuiz on two MCQ benchmarks, i.e., OpenTriviaQA and Massive Multitask Language Understanding (MMLU), across multiple LLM families, including both open-weight and proprietary black-box models. Results show that GradQuiz consistently reduces LLM accuracy in answering MCQs. In particular, when applied to gemma-3-27b-it, it reduces accuracy by 67.26% compared to the original quizzes and by 51.07% compared to the strongest competing adversarial distractor approach on the MMLU dataset, while it achieves accuracy reductions of 49.31% and 42.92%, respectively, on the OpenTriviaQA dataset. Human evaluations confirm that the generated distractors preserve pedagogical coherence and relevance, while a controlled study on students shows that GradQuiz does not increase MCQ difficulty for learners who do not rely on LLMs to answer quizzes.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Gianluca Bonifazi

Christopher Buratti

Michele Marchetti

Journals

ACM Transactions on Intelligent Systems and Technology

Actions

Institutions

University of Modena and Reggio Emilia

Marche Polytechnic University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Leveraging Adversarial Attacks to Generate Multiple-Choice Quizzes Robust Against Large Language Models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study