Abusive news refers to digital content designed to maximize clicks and advertising revenue through sensational headlines, repetitive postings, or emotionally charged language, rather than upholding journalistic integrity. Despite growing concerns about its impact on media credibility and public trust, existing detection approaches lack systematic categorization and type-specific methodologies. This study addresses this gap by proposing a six-type typology of abusive news—content recycling, keyword insertion, title–body inconsistency, commercial promotion, emotionally stimulating headline, and automatically generated types—based on five analytical dimensions: content structure, authenticity, algorithmic manipulability, sensationalism, and information-ecosystem impact. We developed type-specific detection pipelines combining BERT-based embeddings, TF-IDF features, and rule-based indicators and evaluated them using a large-scale Korean clickbait corpus. Results demonstrate that BERT achieves higher F1-scores (0.89) for automatically generated content, while TF-IDF with SVM provides more stable precision (0.60) for emotionally charged articles under class imbalance. Cross-domain experiments confirm that models trained on diverse, balanced topic sets generalize better than volume-focused models, with diversity improving F1-scores by up to 0.07. BERT models show higher false positive rates on repetitive legitimate content compared to TF-IDF approaches, highlighting the importance of type-adaptive architectures and diversity-aware data design in abusive news detection systems.
Choi et al. (Thu,) studied this question.