What does this research mean for the field?

Multi-layer graph transformers can be reduced to single-layer models without sacrificing representation capacity, enabling approximation-free linear scaling and orders-of-magnitude inference acceleration on large graphs. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.CHALLENGES_CONSENSUS.

What question did this study set out to answer?

The aim is to assess whether deep attention is essential for effective graph representation learning.

May 30, 2026

SGFormer: Simplifying and Scaling Graph Transformers with Single-Layer Attention and Approximation-Free Linear Complexity

Key Points

The aim is to assess whether deep attention is essential for effective graph representation learning.
Developed SGFormer utilizing single-layer global attention with linear complexity.
Analyzed the necessity of multi-layer architectures in graph transformers.
Conducted empirical comparisons on medium-sized graphs and the ogbn-papers100M dataset.
SGFormer provides significant inference acceleration over state-of-the-art transformers.
Achieves exact linear scaling with graph size without approximations.
Effectively accommodates all-pair interactions with enhanced efficiency.

Abstract

Learning representations on large graphs is a fundamental challenge due to complex inter-dependencies. While Transformers excel on small graphs via global attention, existing architectures often mirror large language models by stacking deep attention layers. This design philosophy restricts the scalability of Transformers on large graphs, as the unique inter-dependency nature makes it non-trivial to losslessly partition a graph for modern accelerators. We provide a theoretical reassessment of whether deep attention is a necessity. Our analysis shows that for a generic hybrid propagation layer that combines global attention and graph-based propagation, multi-layer models can be reduced to one-layer counterparts without sacrificing representation capacity. Guided by these insights, we propose Simplified Single-Layer Graph Transformer (SGFormer), which utilizes single-layer global attention with approximation-free linear complexity. Unlike scalable Transformers that rely on stochastic approximations or restricted receptive fields, SGFormer scales exactly linearly w.r.t. graph sizes and requires none of any approximation for accommodating all-pair interactions. Empirically, it yields orders-of-magnitude inference acceleration over state-of-the-art Transformers on medium-sized graphs and scales smoothly to the web-scale ogbn-papers100M dataset (0.1B nodes) on a single GPU with 24GB memory. Our results suggest that principled simplification is a highly effective path for powerful, scalable foundation models for large-graph learning.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Qitian Wu

Broad Institute

Kai Yang

Shanghai Jiao Tong University

Hengrui Zhang

University of Illinois Chicago

Journals

IEEE Transactions on Pattern Analysis and Machine Intelligence

Actions

Institutions

Broad Institute

University of Hong Kong

University of Illinois Chicago

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

SGFormer: Simplifying and Scaling Graph Transformers with Single-Layer Attention and Approximation-Free Linear Complexity

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider