arXiv:2509.25035 Abstract | arXiv Analytics

arXiv:2509.25035 [cs.CL]Abstract References Reviews Resources

Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct

Haoyang Zheng, Xinyang Liu, Cindy Xiangrui Kong, Nan Jiang, Zheyuan Hu, Weijian Luo, Wei Deng, Guang Lin

Published 2025-09-29, updated 2025-10-01Version 2

Fast and high-quality language generation is the holy grail that people pursue in the age of AI. In this work, we introduce Discrete Diffusion Divergence Instruct (DiDi-Instruct), a training-based method that initializes from a pre-trained (masked) discrete diffusion language model (dLLM) and distills a few-step student for fast generation. The resulting DiDi-Instruct model achieves comparable or superior performance to its dLLM teacher and the GPT-2 baseline while enabling up to 64$\times$ acceleration. The theoretical foundation of DiDi-Instruct is a novel framework based on integral KL-divergence minimization, which yields a practical training algorithm. We further introduce grouped reward normalization, intermediate-state matching, and the reward-guided ancestral sampler that significantly improve training stability, model coverage, and inference quality. On OpenWebText, DiDi-Instruct achieves perplexity from 62.2 (8 NFEs) to 18.4 (128 NFEs), which outperforms prior accelerated dLLMs and GPT-2 baseline. These gains come with a negligible entropy loss (around $1\%$) and reduce additional training wall-clock time by more than $20\times$ compared to competing dLLM distillation methods. We further validate the robustness and effectiveness of DiDi-Instruct through extensive ablation studies, model scaling, and the generation of discrete protein sequences. In conclusion, DiDi-Instruct is an efficient yet effective distillation method, enabling language generation in the blink of an eye. We will release both code and models at github.com/haoyangzheng-ai/didi-instruct.

Comments: 56 pages, 7 figures, 7 tables

Categories: cs.CL, cs.AI, cs.LG

Keywords: discrete diffusion divergence instruct, ultra-fast language generation, additional training wall-clock time, didi-instruct model achieves comparable

arXiv Analytics

arXiv:2509.25035 [cs.CL]Abstract References Reviews Resources

Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct

Links

Toolbox

arXiv:2509.25035 [cs.CL]AbstractReferencesReviewsResources

Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct

Links

Toolbox

arXiv:2509.25035 [cs.CL]Abstract References Reviews Resources