What question did this study set out to answer?

The goal is to develop assessment items for programming that measure learners' skills rather than their access to advanced models.

May 14, 2026Open Access

Novel Problem Generation for Programming Pedagogy in the LLM Era: A Non-Leakage Methodology

Read Full Paperexternally

Key Points

The goal is to develop assessment items for programming that measure learners' skills rather than their access to advanced models.
Synthesis of training-set contamination and LLM literature to inform methodology.
Development of a four-rule non-leakage criterion set for problem-generation.
Application of techniques for parametric variation, unique-context grounding, and compositional novelty.
Introduced a structured workflow for generating programming problems that mitigates leakage risks.
Proposed a design recommendation outlining the percentage distribution of techniques in problem-set creation.
Acknowledge that novelty in programming problems is an evolving challenge as training data and models advance.

Abstract

Vietnamese chuyên Tin and IOI-feeder programming pedagogy now operates inside a sharp constraint: the canonical public competitive-programming corpus (Codeforces archive, USACO, AtCoder, VNOI, Project Euler) and the standard programming-LLM benchmarks (HumanEval, MBPP, APPS) are already absorbed into contemporary code-capable model pre-training, so any assessment item that names a known problem, a known solution kernel, or a known reduction is leakage-vulnerable by construction. Reusing the public corpus as-is for assessment thus measures the learner's access to a model more than the learner's mastery of the underlying skill. This paper synthesises the training-set-contamination literature (Sainz et al. 2023; Riddell, Ni and Cohan 2024; Yang et al. 2023; Roberts et al. 2023; Jain et al. 2024 LiveCodeBench; Carlini et al. 2021), the LLM-hallucination and verification literature (Magesh et al. 2024; Mitchell et al. 2023 DetectGPT), the parametric-exercisegeneration tradition in CS education (Ihantola et al. 2010; Edwards et al. 2014 Pythy), variation theory (Marton and Booth 1997; Marton, Runesson and Tsui 2004), code-similarity detection (Aiken et al. 2003 MOSS winnowing; Prechelt et al. 2002 JPlag), fuzzing-for-coverage discipline (Sutton et al. 2007; Klees et al. 2018), problem-posing in mathematics education (Pólya 1945; Singer, Ellerton and Cai 2013), competitive-programming setter craft (Paşin, Schmid and Sundholm 2022; IOI 2024 Syllabus), and the AI-era assessment-design empirical anchor (Bastani et al. 2024 GPT-Tutor RCT; Kosmyna et al. 2025 cognitive-debt EEG) into a four-rule non-leakage criterion set and a fourtechnique methodology — T1 parametric variation, T2 uniquecontext grounding, T3 compositional novelty, T4 audit-via-LLMself-test — applied as a problem-set design recommendation (60 percent T1 / 25 percent T2 / 15 percent T3, all gated by T4) with an explicit deployment workflow and audit-loop integration. The methodology is honest that novelty is a moving target against an advancing training cutoff; the contribution is a structurally durable workflow, not a permanent solution.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

That Le

ElSohly Laboratories (United States)

Actions

Institutions

ElSohly Laboratories (United States)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Novel Problem Generation for Programming Pedagogy in the LLM Era: A Non-Leakage Methodology

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study