What question did this study set out to answer?

The study aims to evaluate the effectiveness of two prompting strategies for coding communication data using large language models.

March 8, 2026Open Access

Automated coding of communication data using large language models: a comparison of hierarchical and direct prompting strategies

Key Points

The study aims to evaluate the effectiveness of two prompting strategies for coding communication data using large language models.
Compared hierarchical and direct prompting strategies for communication data coding.
Utilized a coding framework with five main categories and seventeen subcategories.
Measured coding accuracy against human coding using Cohen’s Kappa and mixed-effects logistic regression.
Both prompting strategies achieved coding accuracy comparable to human–human reliability (κ ≈ 0.57–0.59).
Direct prompting led to an 18% increase in agreement odds compared to hierarchical prompting.
Hierarchical prompting was more prone to errors when main categories were misclassified, whereas direct prompting showed more stable subcategory coding.

Abstract

Coding communication data is essential for assessing 21st-century skills such as collaboration and communication, but large-scale human coding is labor-intensive. Large language models (LLMs) such as ChatGPT offer a scalable alternative, yet their accuracy depends on both coding framework complexity and prompting strategy. Using a communication coding framework with five main categories and seventeen subcategories, we compared two prompting strategies: a hierarchical strategy that first assigns main categories and then codes subcategories, and a direct strategy that directly codes subcategories in a single step. Coding accuracy was evaluated against human coding using Cohen’s Kappa and mixed-effects logistic regression. Both strategies achieved agreement comparable to human–human reliability (overall κ ≈ 0.57–0.59). However, direct prompting consistently outperformed hierarchical prompting, yielding an approximately 18% increase in the odds of agreement. Hierarchical prompting was more susceptible to error propagation when main categories were misclassified, whereas direct prompting produced more stable subcategory coding. These results provide guidance for using LLMs to code communication data under complex coding frameworks.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Wenju Cui

Educational Testing Service

Jiangang Hao

Yang Jiang

Journals

SHILAP Revista de lepidopterología

Frontiers in Education

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Automated coding of communication data using large language models: a comparison of hierarchical and direct prompting strategies

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study