To combat the misuse of large language models (LLMs), many recent studies have presented LLM-generated text detectors with promising performance. When users instruct LLMs to generate text, the instruction can include different constraints depending on the user's needs. Prior studies on prompt sensitivity have shown that even small differences in instructions can substantially alter the quality and characteristics of generated texts. However, most recent studies have not covered such diverse instruction patterns when creating datasets for LLM detection. In this study, we systematically examined the robustness of detectors to instruction diversity through task-oriented constraints that naturally appear in instructions but are not related to detection evasion. We demonstrated that even powerful detectors exhibit a large variance in detection performance under such constraints. Focusing on student essay writing as a realistic domain, task-oriented constraints were manually created based on several essay quality factors. Our experiments showed that the standard deviation (SD) of the current detectors' performance on texts generated by an instruction with such a constraint is significantly larger (up to an SD of 14.4 F1-score) than that of generating texts multiple times or paraphrasing the instruction. We also observed an overall trend in which the constraints made LLM detection more challenging than without them. Our analysis suggests that this variance cannot be attributed to trivial output variation across constraints or fluctuations due to the average performance level, but instead stems from vulnerabilities in detectors specific to these constraints. In particular, detectors exhibit large performance degradation under constraints on the vocabulary or style of the generated texts. Finally, to better understand this effect, we found that the high instruction-following ability of LLMs fosters a large impact of such constraints on detection performance. To facilitate further development of robust detectors against diverse instructions, we released our datasets at https://github.com/ryuryukke/HowYouPromptMatters.
Koike et al. (Thu,) studied this question.