This paper systematically evaluates the reliability of Large Language Models (LLMs) in news headline generation. With the full integration of artificial intelligence technology into the field of news production, AI generated content has attracted continuous attention in terms of factual accuracy and emotional adaptability. The existing research lacks a systematic quality evaluation system for different types of news. To this end, this study constructed four kinds of news corpora of science and technology, finance, entertainment and society, generated titles based on qwen-max API, and made quantitative analysis through three-dimensional indicators of factuality, emotional tendency and keyword coverage. The empirical study shows that the fact bias of financial titles is the most significant, the emotional bias of entertainment titles is the most prominent, and the coverage of technology keywords is the weakest. This discovery reveals the effect differences of the generative model in different text fields, and provides an empirical basis for news agencies to establish a typed quality control mechanism. The follow-up research will expand the multimodal news generation evaluation, and explore the field adaptive generation optimization path.
Longying Liao (Mon,) studied this question.