Exploring and enhancing the transfer of distribution in knowledge distillation for autoregressive language models | Synapse