Mainstream Large Language Models (LLMs) work under the paradigm of Next-Token Prediction (NTP). Multi-Token Prediction (MTP) is motivated by higher decoding efficiency, extending NTP to enable LLMs to draft multiple tokens during each forward pass. However, existing MTP approaches pretrain MTP along with the target LLM, making it difficult to unlock MTP for LLMs without official support. In this work, we propose a post-hoc approach to training an MTP module for a target LLM, providing an efficient way to evolve the LLM from NTP to MTP. The proposed approach features two main characteristics. (1) No changes to the target LLM, since it is frozen during MTP training. (2) Efficient MTP training via self-distillation from the target LLM’s native NTP capability. Results show that our approach can post-hoc train a performant MTP module via lightweight pretraining.
Xu et al. (Mon,) studied this question.