We propose a Cognitive Load Theory (CLT)-informed attention mechanism for transformer-based text classification. The proposed attention mechanism computes a per-token cognitive-load signal—derived from attention entropy, margin-based classification uncertainty, and optional inverse document frequency—and maps this signal to a learnable attention “budget” that scales outgoing attention mass during decoding. Unlike architectural efficiency techniques such as Multi-Query or Grouped-Query Attention, the CLT mechanism requires no structural modifications and introduces only modest per-step computational overhead while preserving full compatibility with standard transformer architectures. Experiments across four datasets (IMDB, AG News, SST-2, and DBpedia) show that CLT-informed attention achieves accuracy comparable to or exceeding a fixed-budget baseline while delivering consistently lower test loss, faster convergence to the best validation checkpoint, reduced attention entropy, and strong alignment between cognitive load and attention mass. Among all variants, an entropy-only load signal yields the most stable and consistent performance across datasets. These results demonstrate that lightweight, cognitively motivated constraints can structure transformer attention while maintaining or improving downstream classification performance.
Graham et al. (Sat,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: