Key points are not available for this paper at this time.
Diffusion models have significantly advanced text-to-image generation by producing high-quality and imaginative results. However, their multi-step sampling process often proves slow, requiring extensive inference steps to achieve satisfactory outcomes. Despite attempts to improve sampling speed and computational efficiency through distillation, creating a functional one-step model has remained elusive. In this study, we investigate Rectified Flow, a recent method primarily applied to small datasets, as a potential solution. Central to Rectified Flow is its reflow procedure, which optimizes probability flow trajectories, refines noise-to- image mapping, and enables effective distillation with student models. We introduce a novel text-conditioned pipeline to convert Stable Diffusion (SD) into an ultra-fast one-step model. Our approach underscores the crucial role of reflow in enhancing noise-to-image assignments. Leveraging this pipeline, we develop the first one-step diffusion-based text-to- image generator capable of producing high-quality images comparable to those generated by SD. Additionally, we extend our methodology to include audio inputs, demonstrating its efficacy in generating images from audio cues with remarkable fidelity and speed. Key Words: Stable Diffusion, Text-to-Image Creation, Image Processing
Satish Karanjekar (Thu,) studied this question.