What question did this study set out to answer?

The aim is to enhance entity linking by improving the capture of layout-dependent relationships in documents.

April 7, 2026Open Access

Gated spatial attention for entity linking in visually rich documents

Key Points

The aim is to enhance entity linking by improving the capture of layout-dependent relationships in documents.
Developed Gated Spatial Attention (GSA) framework for LayoutLMv3.
Implemented Spatial Position Enhancement (SPE) with linear biases in attention layers.
Introduced Gated Attention (GA) to filter irrelevant token outputs.
GSA outperformed existing models in semantic entity recognition and relation extraction.
Achieved state-of-the-art performance on FUNSD and CORD datasets with minimal computational cost.

Abstract

Entity linking in visually rich documents aims to identify semantic relationships between entities (e.g., key–value pairs) by jointly leveraging textual, visual, and spatial information. Despite the success of pre-trained document models such as LayoutLMv3, two challenges remain for relation extraction: (1) spatial position signals injected only at the input embedding layer tend to decay in deeper transformer layers, weakening the model’s ability to capture layout-dependent entity associations; and (2) in long documents, softmax attention distributes weights across many irrelevant tokens, diluting the focus on informative regions. To address these issues, we propose Gated Spatial Attention (GSA) , a lightweight, plug-in framework on top of LayoutLMv3 that comprises two complementary modules: Spatial Position Enhancement (SPE) , which injects ALiBi-style linear biases into every attention layer with head groups specialized for reading-order, horizontal, vertical, and semantic proximity, and Gated Attention (GA) , which applies a per-token scalar gate after the scaled dot-product attention to suppress outputs from irrelevant tokens.Experiments on FUNSD and CORD demonstrate that GSA consistently improves both semantic entity recognition and relation extraction, achieving state-of-the-art results with negligible computational overhead.

Gated spatial attention for entity linking in visually rich documents

Key Points

Abstract

Cite This Study