Key points are not available for this paper at this time.
The foundation of current large language model applications lies in the generative language model, which typically employs an autoregressive token generation approach. However, this model faces two key limitations: its unidirectional causal attention mechanism restricts semantic expressiveness, and the deep decoder results in slower decoding. To address these issues, we introduce the autoregressive language model with historical context re-encoding (HCR). Our method improves the encoding of historical tokens by periodically re-encoding newly generated tokens. The model incorporates a history encoder and uses a relatively shallow decoder for short-segment decoding. This innovative architecture enhances generation quality, accelerates decoding, and operates efficiently in both generation and comprehension modes. Comprehensive experiments demonstrate that HCR significantly outperforms standard autoregressive models in various language comprehension and generation tasks, delivering an average performance boost of over 2.3% and a 1.3x improvement in decoding speed.
Yimeng Zhuang (Wed,) studied this question.