What question did this study set out to answer?

This research aims to improve the detection of cyberbullying incidents in mixed-language dialogues by using advanced generative AI techniques.

March 21, 2026Open Access

Early discovery of cyberbullying incidents in multiparty Chinese-English code-mixed colloquial dialogue: A generative AI approach

Key Points

This research aims to improve the detection of cyberbullying incidents in mixed-language dialogues by using advanced generative AI techniques.
Utilized long context length large language models for incident detection.
Developed tailored prompt templates to model victim-perpetrator associations.
Finetuned two large pretrained models with annotated Chinese-English colloquial dialogues.
Applied low-rank adaptation technique for efficient model training.
Collected and analyzed 1685 dialogue sessions yielding 14,257 tweets.
Holistic message handling in dialogue sessions significantly outperformed individual message analysis for victim identification.
Victim-perpetrator association improved detection performance across customizable severity levels.
Achieved enhanced early detection metrics, including improved ERDE and F-latency scores.

Abstract

Various machine learning models have been employed to detect aggressive speech and cyberbullying incidents on social media platforms. Previous studies have not addressed the applicability of handling a lengthy social media session at a time due to the short content window and technical limitations imposed on early NLP models. In addition, some studies have focused on content classification without addressing the identification of related perpetrators and victims, which are important components for determining cyberbullying severity levels. This paper introduces a victim-perpetrator-category association modeling approach and investigates the practical use of recently developed long context length large language models to handle early cyberbullying incident detection tasks. Tailored prompt templates with proper stop phrases were designed to explicitly model the associations between victims and perpetrators, enabling a more accurate prediction of the occurrence of cyberbullying incidents. Two large pretrained models, Llama 3.1 8B and Qwen 2.5 14B, were finetuned using the low-rank adaptation fine-tuning technique with 1685 manually annotated multiparty Chinese-English Cantonese colloquial dialogue sessions, resulting in a total of 14,257 tweets. It is shown that the holistic use of dialogue session messages at a time provides significant performance advantages over handling messages individually in the victim identification task. The explicit victim-perpetrator-category association is empirically shown to improve early cyberbullying detection performance in terms of the ERDE and F-latency across all customizable severity levels gauged by different thresholds of insult frequency, number of perpetrators and power imbalance. • Explicit victim-perpetrator-category association for early bullying detection. • Identify cases by multi-thresholds: frequency, participants and power imbalance. • Demonstrate practical prompt templates for advanced generative LLMs. • Efficient fine-tuning architecture with rank stabilized low-rank adaptation method. • Annotate 1685 sessions of Chinese-English code-mixed colloquial language dialogues.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Carlin Chun Fai Chu

Calvin Chun Ho Tong

Chun Hung Chiu

Journals

Intelligent Systems with Applications

Actions

Institutions

University of Hong Kong

Sun Yat-sen University

Hang Seng University of Hong Kong

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Early discovery of cyberbullying incidents in multiparty Chinese-English code-mixed colloquial dialogue: A generative AI approach

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider