ABSTRACT Large language models (LLMs) can generate text that resembles L2 learner writing at requested proficiency levels, but whether such texts adapt to communicative contexts as human learners do remains unclear. This study compared LLM‐generated and human second‐language writing across three task‐defined categories (informal–personal, semi‐formal–expository, formal–argumentative) and four CEFR levels (A2–C1) using Biber's multi‐dimensional analysis (MDA). Six instruction‐tuned models responded to Write n = 4724 LLM), with EFCAMDAT as a human baseline. Under CEFR‐only prompting, LLM outputs shifted on MDA dimensions tied to informational density and explicitness, but showed weaker, differently directed, or less variable differentiation on dimensions tied to persuasion and elaboration, especially in formal–argumentative tasks. The largest observed gap was at C1, where CEFR descriptors most emphasize register flexibility. LLM outputs were more uniform than human writing and clustered near the center of the human distributional space, showing less variation than authentic learner texts. The broad pattern recurred across all six model families. CEFR‐conditioned generation can reproduce some structural signals of proficiency while showing limited human‐like register differentiation in MDA space, a pattern we interpret as form without function.
Carlo et al. (Thu,) studied this question.