What question did this study set out to answer?

The research aims to evaluate context- and content-aware node embedding techniques using textual features in citation graphs.

February 2, 2026Open Access

Evaluating node embedding techniques with context and content awareness using textual features

Key Points

The research aims to evaluate context- and content-aware node embedding techniques using textual features in citation graphs.
Constructed a large citation graph from ArXiv papers.
Evaluated keyword and keyphrase extraction methods on node content.
Applied node embedding methods: CANE and DeepEmLAN for link prediction and node classification tasks.
Examined a graph augmentation approach to simulate real-world conditions.
Replacing full-text inputs with keyphrases often maintains or improves performance in tasks.
Faster processing times were achieved with concise inputs compared to full texts.
Graph augmentation amplified computation time differences between embedding methods.

Abstract

Context- and content-aware node vectorization is the process of representing graph nodes as low-dimensional vectors by taking into account the graph’s structure, the content of each node, often textual data, and the context surrounding the node. Graphs where node content is text are known as textual graphs, and they appear in many domains, such as social networks, recommendation systems, and academic citation networks. In citation graphs, which are the focus of this study, each node represents a research article, the content is the article's text, and edges indicate citation or reference relationships between papers. This study investigates how keyword and keyphrase extraction methods can be used to simplify node content while improving the performance of node embedding methods. Several text extraction methods are evaluated and applied to a large citation graph constructed from ArXiv papers, assessing their output using two node embedding methods: CANE and DeepEmLAN. By replacing full-text inputs with concise, descriptive keyphrases, the experiments achieve faster processing while frequently maintaining or even improving performance in link prediction and node classification tasks. The study also investigates a text enrichment strategy that leverages known node category information. Additionally, a graph augmentation approach is examined to better simulate real-world conditions, demonstrating that this preprocessing technique amplifies the gap in computation times between the two node embedding methods when using full-text inputs versus the keywords and keyphrases extracted by the text extraction methods.

Evaluating node embedding techniques with context and content awareness using textual features

Key Points

Abstract

Cite This Study