What question did this study set out to answer?

The aim is to introduce spectral attention that utilizes frequency domain filtering to enhance attention maps in transformer models.

March 26, 2026Open Access

Spectral attention for transformers: frequency-domain filtering of attention maps

Key Points

The aim is to introduce spectral attention that utilizes frequency domain filtering to enhance attention maps in transformer models.
Introduced spectral attention using FFT/IFFT with learnable, per-head masks.
Studied nine variants of the mechanism including an adaptive approach based on input content.
Tested on datasets WikiText-2, Penn Treebank, and WikiText-103.
Achieved a 10.7% reduction in perplexity on WikiText-2.
Achieved a 15.3% reduction in perplexity on WikiText-103.
Found that low-frequency components carry the most useful signal for language modeling.

Abstract

Abstract This paper introduces spectral attention, which filters the attention score matrix directly in the frequency domain via FFT/IFFT with learnable, per-head masks. This complements the time-domain view by enabling explicit control over low-, mid-, and high-frequency components of attention patterns. We study nine variants, including an adaptive mechanism that modulates masks from input content. On WikiText-2, Penn Treebank, and WikiText-103, the adaptive spectral variant consistently improves over standard attention, reducing perplexity by 10.7% on WikiText-2 and 15.3% on WikiText-103 in our setup. Analysis shows low-frequency components carry the most useful signal and that learned frequency preferences outperform fixed low/high/band-pass filters. These results indicate that frequency-domain processing is an effective complement for autoregressive transformer language modeling in our evaluated settings.

Bookmark

View Full Paper

Bookmark

View Full Paper

Spectral attention for transformers: frequency-domain filtering of attention maps

Key Points

Abstract

Cite This Study