Purpose This study aims to compare large language models (LLMs) with human analysis for rhetorical move detection in journal article abstracts, examining whether automated discourse analysis can complement human efforts in bibliographic metadata creation and information retrieval. Design/methodology/approach Using an established data set and BAMRC (Background, Aim, Method, Results, Conclusion) framework from a previous study of social science abstracts, four contemporary LLMs (OpenAI GPT-4o Mini, DeepSeek Chat, Claude 3.5 Haiku and Gemini 2.0 Flash) were compared against the original human analysis. Identical prompts were used across models to ensure comparability. Findings LLMs showed modest agreement with human annotations at the abstract level (51.7%–57.9% Jaccard similarity) but substantially higher intermodel agreement (77.0%–86.7%). This pattern indicates convergence toward a distinct LLM-influenced annotation style that diverges systematically from human judgment, including generally higher rates of complete BAMRC structures and more frequent use of the Undefined category at the sentence level (11.6%–29.2% vs 4.8% for humans). Agreement declined sharply at the sentence level, averaging approximately 19%. Practical implications LLM-based rhetorical analysis can provide scalable insights into abstract structure and academic discourse patterns, informing quality assessment and indexing practices in digital libraries. API-driven workflows enable low-cost, large-scale metadata analysis as a complement to human expertise, without presuming full replacement of manual processes. Originality/value This study offers a multi-LLM comparison on a decade-old, human-annotated social science abstract data set, highlighting both the potential and the limitations of transferring LLM capabilities to rhetorical move analysis and related discourse tasks.
Eungi Kim (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: