This conference paper explores the use of large language models (LLMs) in writing assessment within Medical English and English for Specific Purposes (ESP) contexts. The study compares teacher feedback with AI-generated feedback on 50 Medical English essays, focusing on the role of prompt engineering in shaping feedback quality. Three conditions were analysed: teacher feedback, AI-minimalist prompting, and AI-structured prompting. Results show that structured prompt engineering produces more detailed, extensive, and systematic feedback, with higher comment counts, longer responses, and near-complete coverage of an error taxonomy. In contrast, minimalist prompting focuses on surface-level issues, while teacher feedback prioritises coherence and higher-order writing concerns. Low overlap between human and AI feedback highlights their complementary roles in writing assessment. The findings contribute to ongoing research on AI in education, demonstrating how LLM-generated feedback can support scalable, efficient, and systematic evaluation in Medical English and ESP, while reinforcing the importance of human expertise in pedagogically sensitive contexts.
Evgeni Stanchev (Sat,) studied this question.