What question did this study set out to answer?

The aim is to improve representation learning for text-attributed graphs by leveraging both semantic text and graph structure effectively.

May 9, 2026Open Access

STAGE: LLM-Driven Semantic and Topological Augmented Graph Embedding for Text-Attributed Graphs

Key Points

The aim is to improve representation learning for text-attributed graphs by leveraging both semantic text and graph structure effectively.
Introduced STAGE, a two-stage framework for TAGs; Stage I utilizes a frozen large language model to enrich node attributes.
Stage II employs random-walk-based context and graph-conditioned token reduction for structure-aware representation learning.
STAGE outperformed strong baseline models across seven benchmark datasets.
Maintained efficiency under input-length constraints while enhancing semantic content.

Abstract

Text-attributed graphs (TAGs) require models to jointly exploit node text and graph structure, yet doing so effectively remains difficult when node text is sparse and the structural context is large. Here, we propose STAGE (Semantic and Topological Augmented Graph Embedding), a two-stage framework for representation learning on TAGs. In Stage I, a frozen large language model is used offline to generate explanatory text that enriches compressed node attributes without introducing online LLM training cost. In Stage II, STAGE performs structure-aware representation learning under a fixed global token budget by combining random-walk-based structural context with graph-conditioned token reduction before PLM encoding. This design preserves informative semantic content while preventing unconstrained sequence expansion. Experiments on seven benchmark datasets show that STAGE consistently outperforms strong baselines under the same evaluation setting and maintains favorable efficiency under bounded input-length constraints.

Read Full Paperexternally

Ask AI

Mark Helpful

Bookmark

Relay

View Full Paper