Pulse Journal Club Active Debates Trending Explore Researchers

Join discussions, follow papers, and never miss your next session.

Download on theApp Store

© Synapse Social LLC, 2026

Home Explore Journal Club Trending

⌘+K

Cached Transformers: Improving Transformers with Differentiable Memory Cachde | Synapse

March 24, 2024Open Access

Cached Transformers: Improving Transformers with Differentiable Memory Cachde

Key Points

Key points are not available for this paper at this time.

Abstract

This work introduces a new Transformer model called Cached Transformer, which uses Gated Recurrent Cached (GRC) attention to extend the self-attention mechanism with a differentiable memory cache of tokens. GRC attention enables attending to both past and current tokens, increasing the receptive field of attention and allowing for exploring long-range dependencies. By utilizing a recurrent gating unit to continuously update the cache, our model achieves significant advancements in six language and vision tasks, including language modeling, machine translation, ListOPs, image classification, object detection, and instance segmentation. Furthermore, our approach surpasses previous memory-based techniques in tasks such as language modeling and displays the ability to be applied to a broader range of situations.

Read Full Paperexternally

Ask AI

Helpful

Bookmark

Share

View Full Paper

Ask AI

Helpful

Bookmark

Share

View Full Paper

Cite This Study

Zhang et al. (Sun,) studied this question.

synapsesocial.com/papers/68e72962b6db6435876a3402 https://doi.org/https://doi.org/10.1609/aaai.v38i15.29636