What question did this study set out to answer?

The central aim is to evolve large language models from next-token prediction to multi-token prediction through a post-hoc approach.

April 10, 2026Open Access

Evolving LLMs from Next-Token Prediction to Multi-Token Prediction via Self-Distillation

Puntos clave

The central aim is to evolve large language models from next-token prediction to multi-token prediction through a post-hoc approach.
Proposed a self-distillation method to train a multi-token prediction module without altering the target language model.
Maintained the original language model's structure by freezing it during multi-token training.
Conducted lightweight pretraining to enhance the multi-token generation capabilities.
The post-hoc approach effectively trains a performant multi-token prediction module.
Demonstrated improved decoding efficiency during the forward pass of language models.

Resumen

Mainstream Large Language Models (LLMs) work under the paradigm of Next-Token Prediction (NTP). Multi-Token Prediction (MTP) is motivated by higher decoding efficiency, extending NTP to enable LLMs to draft multiple tokens during each forward pass. However, existing MTP approaches pretrain MTP along with the target LLM, making it difficult to unlock MTP for LLMs without official support. In this work, we propose a post-hoc approach to training an MTP module for a target LLM, providing an efficient way to evolve the LLM from NTP to MTP. The proposed approach features two main characteristics. (1) No changes to the target LLM, since it is frozen during MTP training. (2) Efficient MTP training via self-distillation from the target LLM’s native NTP capability. Results show that our approach can post-hoc train a performant MTP module via lightweight pretraining.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo