What question did this study set out to answer?

The research aims to develop a memory-efficient method for constructing token graphs from transformers that capture long-distance relationships between words.

April 4, 2026Open Access

A Straightforward Approach to Construct ‘Lightweight’ Token Graphs from Transformers

Read Full Paperexternally

Key Points

The research aims to develop a memory-efficient method for constructing token graphs from transformers that capture long-distance relationships between words.
Propose a memory-efficient method using attention coefficients from transformers.
Utilize a chunk-and-stride mechanism to encode long documents effectively.
Compare performance against traditional sliding window methods and GNN-based techniques.
Exceed performance of sliding windows in longer documents and achieve statistically identical results in shorter ones.
Outperform existing GNN-based state-of-the-art techniques.
Show marginally lower performance compared to fine-tuning for documents fitting the context window, but surpasses in larger documents.

Abstract

Abstract Even though transformers are the standard to extract information from text, they have considerable limitations that make them sub-optimal for certain tasks. Graph neural networks (GNNs) are an alternative that mitigate some of these limitations, with recent work focusing on combining information from pre-trained language models (PLMs) with graph structures. However, most existing methods use fixed-size sliding windows to construct graphs, ignoring long-distance relationships between words. We propose a memory-efficient method to construct token graphs that exploits semantic and relational information inside transformers via their attention coefficients, mitigating the limitations of using sliding windows. Additionally, each graph can fully encode documents longer than the transformer’s context window via a chunk-and-stride mechanism, while also reducing memory usage. Our method surpasses the performance of sliding windows when both approaches are compared under the same model architecture, particularly in longer documents, and reaches statistically identical performance in shorter ones. It also surpasses GNN-based state-of-the-art techniques. When compared to fine-tuning, our method reaches marginally lower performance in documents that fit inside the pre-trained model’s context window. However, surpasses it in documents that do not fit inside these windows, which comes as a result of fully encoding each document using our chunk-and-stride mechanism.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

João Pimentel

Joana Amorim

Frank Rudzicz

Journals

Computational Linguistics

Actions

Institutions

Dalhousie University

Vector Institute

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

A Straightforward Approach to Construct ‘Lightweight’ Token Graphs from Transformers

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study