March 3, 2024Open Access

The Hidden Attention of Mamba Models

Key Points

Key points are not available for this paper at this time.

Abstract

The Mamba layer offers an efficient selective state space model (SSM) that is highly effective in modeling multiple domains including NLP, long-range sequences processing, and computer vision. Selective SSMs are viewed as dual models, in which one trains in parallel on the entire sequence via IO-aware parallel scan, and deploys in an autoregressive manner. We add a third view and show that such models can be viewed as attention-driven models. This new perspective enables us to compare the underlying mechanisms to that of the self-attention layers in transformers and allows us to peer inside the inner workings of the Mamba model with explainability methods. Our code is publicly available.

Read Full Paperexternally

Ask AI

Helpful

Bookmark

View Full Paper

Cite This Study

Ali et al. (Sun,) studied this question.

synapsesocial.com/papers/68e75ef0b6db6435876d58c7 https://doi.org/https://doi.org/10.48550/arxiv.2403.01590

Ask AI

Helpful

Bookmark

View Full Paper