What question did this study set out to answer?

The study aims to categorize abusive news types and develop specific detection methodologies.

March 28, 2026Open Access

Defining Abusive News Categories: Proposing a Detection Model for Digital Media Integrity

Key Points

The study aims to categorize abusive news types and develop specific detection methodologies.
Proposed a six-type typology of abusive news
Developed type-specific detection pipelines using BERT and TF-IDF features
Evaluated detection models on a large-scale Korean clickbait corpus
Conducted cross-domain experiments to assess model generalization
BERT achieved an F1-score of 0.89 for automatically generated content
TF-IDF with SVM provided a stable precision rate of 0.60 for emotionally charged articles
Diverse topic sets improved F1-scores by up to 0.07
BERT models had higher false positive rates on repetitive legitimate content

Abstract

Abusive news refers to digital content designed to maximize clicks and advertising revenue through sensational headlines, repetitive postings, or emotionally charged language, rather than upholding journalistic integrity. Despite growing concerns about its impact on media credibility and public trust, existing detection approaches lack systematic categorization and type-specific methodologies. This study addresses this gap by proposing a six-type typology of abusive news—content recycling, keyword insertion, title–body inconsistency, commercial promotion, emotionally stimulating headline, and automatically generated types—based on five analytical dimensions: content structure, authenticity, algorithmic manipulability, sensationalism, and information-ecosystem impact. We developed type-specific detection pipelines combining BERT-based embeddings, TF-IDF features, and rule-based indicators and evaluated them using a large-scale Korean clickbait corpus. Results demonstrate that BERT achieves higher F1-scores (0.89) for automatically generated content, while TF-IDF with SVM provides more stable precision (0.60) for emotionally charged articles under class imbalance. Cross-domain experiments confirm that models trained on diverse, balanced topic sets generalize better than volume-focused models, with diversity improving F1-scores by up to 0.07. BERT models show higher false positive rates on repetitive legitimate content compared to TF-IDF approaches, highlighting the importance of type-adaptive architectures and diversity-aware data design in abusive news detection systems.

Defining Abusive News Categories: Proposing a Detection Model for Digital Media Integrity

Key Points

Abstract

Cite This Study