This article investigates the phenomenon of unmarked automatically generated content in Russian-language online news media and addresses the methodological challenges of its reliable detection. With the rapid adoption of generative artificial intelligence—particularly large language models (LLMs)—in journalistic workflows, questions of transparency, authorship, and editorial responsibility have become increasingly salient. The study reviews and systematizes existing approaches to binary text classification aimed at distinguishing between human-written and AI-generated content, with particular attention to their applicability in the Russian linguistic and media context. Empirically, the research applies a supervised detection framework to a large corpus of news texts collected from 20 leading Russian online media outlets with the highest national traffic shares between March and May 2025, selected using the SimilarWeb analytical service. The detection methodology is based on fine-tuning a neural network built on the RuRoBERTa architecture, adapted for Russian-language processing and trained on a combination of annotated corpora and controlled synthetic paraphrases of real news articles. To account for document-level heterogeneity, the analysis employs fragment-level classification followed by aggregated decision rules. The scientific novelty of the study lies in its comprehensive and reproducible approach to the quantitative assessment of unmarked AI-generated content in Russian news media. The findings indicate that only 18% of the analyzed platforms exceed the statistically significant threshold of 17.1% for AI-generated material, while the remaining 82% stay below this level, suggesting the continued dominance of traditional editorial practices. These results contribute to ongoing discussions in digital journalism, media ethics, and information security, and provide an empirical foundation for future research on AI transparency and content governance.
Vasilev et al. (Thu,) studied this question.