Abstract Even though transformers are the standard to extract information from text, they have considerable limitations that make them sub-optimal for certain tasks. Graph neural networks (GNNs) are an alternative that mitigate some of these limitations, with recent work focusing on combining information from pre-trained language models (PLMs) with graph structures. However, most existing methods use fixed-size sliding windows to construct graphs, ignoring long-distance relationships between words. We propose a memory-efficient method to construct token graphs that exploits semantic and relational information inside transformers via their attention coefficients, mitigating the limitations of using sliding windows. Additionally, each graph can fully encode documents longer than the transformer’s context window via a chunk-and-stride mechanism, while also reducing memory usage. Our method surpasses the performance of sliding windows when both approaches are compared under the same model architecture, particularly in longer documents, and reaches statistically identical performance in shorter ones. It also surpasses GNN-based state-of-the-art techniques. When compared to fine-tuning, our method reaches marginally lower performance in documents that fit inside the pre-trained model’s context window. However, surpasses it in documents that do not fit inside these windows, which comes as a result of fully encoding each document using our chunk-and-stride mechanism.
Building similarity graph...
Analyzing shared references across papers
Loading...
João Pimentel
Joana Amorim
Frank Rudzicz
Computational Linguistics
Dalhousie University
Vector Institute
Building similarity graph...
Analyzing shared references across papers
Loading...
Pimentel et al. (Fri,) studied this question.
synapsesocial.com/papers/69d0af83659487ece0fa58a7 — DOI: https://doi.org/10.1162/coli.a.618