Abstract Embedding news articles is a crucial tool for various fields, including media bias detection, fake news identification, and news recommendation systems. However, existing news embedding methods are not optimized to capture the latent context of news events. Most embedding methods rely on full-text information, neglecting the generation of time-relevant embeddings. In this paper, we propose a novel, lightweight method that optimizes news embedding generation by focusing on entities and themes mentioned in articles and their historical connections to specific events. We suggest a three-stage method. First, we process and extract events, entities, and themes from news articles. Second, we generate periodic time embeddings for themes and entities by training time-separated GloVe models on current and historical data. Lastly, we concatenate the news embeddings generated by two distinct approaches: Smooth Inverse Frequency (SIF) for article-level vectors and Siamese Neural Networks for embeddings with nuanced event-related information. We leveraged over 850,000 news articles and one million events from the GDELT project to test and evaluate our method. We conducted a comparative analysis of different news embedding generation methods for validation. Our experiments demonstrate that our approach improves and outperforms state-of-the-art methods on shared event detection tasks.
Ishlach et al. (Wed,) studied this question.