Coding communication data is essential for assessing 21st-century skills such as collaboration and communication, but large-scale human coding is labor-intensive. Large language models (LLMs) such as ChatGPT offer a scalable alternative, yet their accuracy depends on both coding framework complexity and prompting strategy. Using a communication coding framework with five main categories and seventeen subcategories, we compared two prompting strategies: a hierarchical strategy that first assigns main categories and then codes subcategories, and a direct strategy that directly codes subcategories in a single step. Coding accuracy was evaluated against human coding using Cohen’s Kappa and mixed-effects logistic regression. Both strategies achieved agreement comparable to human–human reliability (overall κ ≈ 0.57–0.59). However, direct prompting consistently outperformed hierarchical prompting, yielding an approximately 18% increase in the odds of agreement. Hierarchical prompting was more susceptible to error propagation when main categories were misclassified, whereas direct prompting produced more stable subcategory coding. These results provide guidance for using LLMs to code communication data under complex coding frameworks.
Building similarity graph...
Analyzing shared references across papers
Loading...
Wenju Cui
Educational Testing Service
Jiangang Hao
Yang Jiang
SHILAP Revista de lepidopterología
Frontiers in Education
Building similarity graph...
Analyzing shared references across papers
Loading...
Cui et al. (Fri,) studied this question.
synapsesocial.com/papers/69ada873bc08abd80d5bb6ae — DOI: https://doi.org/10.3389/feduc.2026.1764154