Although prior research indicates that expert reviewers identify AI-generated academic texts with low accuracy, the quantitative analysis presented in this paper has revealed marked, measurable differences between human-authored and AI-generated works. We investigate this duality in the context of Hungarian as an under-represented training language: on one hand, we perform a quantitative text analysis of the lexical, syntactic, and stylistic features of Hungarian-language academic essays by human authors (doctoral candidates) and those generated by Google Gemini, OpenAI GPT, and Anthropic Claude models. On the other hand, using a blind experimental design, we analyze how human reviewers (N = 391) with varying levels of expertise perceive and assess the quality of the texts. The quantitative analysis showed that AI-generated essays are characterized by lower lexical diversity and an absence of epistemic markers. The human evaluation yielded complex results: reviewers active in academic practice (members of the academically active and academically passive clusters) acknowledged the formal and logical precision of the AI-generated texts, yet they noted a lack of originality and critical depth. Reviewers less engaged with academic practice (members of the non-academic and inactive clusters), in contrast, were primarily persuaded by the more natural style and originality of the human-authored texts. The findings suggest that with moderate-level prompting and the provision of source literature, an AI-generated essay can be created in a few hours that reviewers deem superior to human work in certain aspects, such as formal and logical precision. Furthermore, our findings suggest that with targeted, more sophisticated prompt engineering, the quality gap between AI-generated and human-authored texts could narrow further. These findings have significant implications for assessment methods in higher education and for the regulation of academic publishing.
Turós et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: