What question did this study set out to answer?

This research aims to enhance transformer-based text classification by integrating cognitive load theory into the attention mechanism.

April 1, 2026Open Access

A Cognitive Load Theory-Informed Attention Mechanism for Transformer-Based Text Classification

Key Points

This research aims to enhance transformer-based text classification by integrating cognitive load theory into the attention mechanism.
Developed a cognitive load theory-informed attention mechanism without structural changes to transformers.
Calculated a cognitive-load signal using attention entropy and classification uncertainty.
Tested the mechanism across four datasets: IMDB, AG News, SST-2, and DBpedia.
Compared performance with fixed-budget baselines to evaluate classification accuracy and test loss.
Achieved accuracy comparable to or exceeding fixed-budget baseline approaches.
Observed lower test loss and faster convergence compared to other methods.
Demonstrated that attention mass aligns strongly with cognitive load, improving overall classification performance.

Abstract

We propose a Cognitive Load Theory (CLT)-informed attention mechanism for transformer-based text classification. The proposed attention mechanism computes a per-token cognitive-load signal—derived from attention entropy, margin-based classification uncertainty, and optional inverse document frequency—and maps this signal to a learnable attention “budget” that scales outgoing attention mass during decoding. Unlike architectural efficiency techniques such as Multi-Query or Grouped-Query Attention, the CLT mechanism requires no structural modifications and introduces only modest per-step computational overhead while preserving full compatibility with standard transformer architectures. Experiments across four datasets (IMDB, AG News, SST-2, and DBpedia) show that CLT-informed attention achieves accuracy comparable to or exceeding a fixed-budget baseline while delivering consistently lower test loss, faster convergence to the best validation checkpoint, reduced attention entropy, and strong alignment between cognitive load and attention mass. Among all variants, an entropy-only load signal yields the most stable and consistent performance across datasets. These results demonstrate that lightweight, cognitively motivated constraints can structure transformer attention while maintaining or improving downstream classification performance.

A Cognitive Load Theory-Informed Attention Mechanism for Transformer-Based Text Classification

Key Points

Abstract

Cite This Study

Also Consider

Also Consider