In this study, we analyze the attention mechanism and propose a novel perspective where sequential inputs within the attention mechanisms do not require strict order. We introduce an innovative approach, called bucket attention, which organizes context in large language models (LLMs) and effectively handles contexts of any length while utilizing a fixed-size space. Furthermore, we present techniques to convert pre-trained models based on traditional attention into the bucket attention framework, along with a method to train models with bucket attention from scratch. These approaches offer practical solutions to improve the efficiency and scalability of LLMs.
Zipeng Ye (Thu,) studied this question.