Key points are not available for this paper at this time.
Abstract The rapid assessment of scientific research trends requires continuous analysis of academic literature to support impactful contributions. Questions persist regarding how to identify the most recent and influential papers to read, the topics worth exploring, and, notably, whether a paper will resonate with the scientific community. While citation counts have long been a conventional metric for assessing research quality, the underlying textual semantics of a paper play a pivotal role in shaping citation patterns. However, tracking and modelling semantic content—i.e., the textual meaning embedded in the text—remains a challenge in natural language processing, particularly when identifying trends in the growing volume of publications. In this paper, we introduce new theoretical foundations and experimental frameworks for vector-based representations of text semantics, a Meaning Space , grounded in principles from information theory. We explore how these novel computational approaches and state-of-the-art language models such as Doc2Vec and variations of Bidirectional Encoder Representations from Transformers have advanced automated semantic analysis and enabled machine learning-based citation prediction using scientific abstracts. Focusing on citation classification, we reveal the significance of text semantics in predicting citation counts and highlight how an informational semantics offers an effective means in predicting scientific impact of papers.
Süzen et al. (Sat,) studied this question.