Diffusion-based or flow-based models have achieved significant progress in video synthesis but require multiple iterative sampling steps, which incurs substantial computational overhead. While many distillation methods that are solely based on trajectory-preserving or distribution-matching have been developed to accelerate video generation models, these approaches often suffer from performance breakdown or increased artifacts under few-step settings. To address these limitations, we propose SwiftVideo, a unified and stable distillation framework that combines the advantages of trajectory-preserving and distribution-matching strategies. Our approach introduces continuous-time consistency distillation to ensure precise preservation of ODE trajectories. Subsequently, we propose a dual-perspective alignment that includes distribution alignment between synthetic and real data along with trajectory alignment across different inference steps. Our method maintains high-quality video generation while substantially reducing the number of inference steps. Quantitative evaluations on the OpenVid-1M benchmark demonstrate that our method significantly outperforms existing approaches in few-step video generation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yanxiao Sun
Jiafu Wu
Yun Cao
Building similarity graph...
Analyzing shared references across papers
Loading...
Sun et al. (Fri,) studied this question.
www.synapsesocial.com/papers/68f10ecee6a12fd042899a73 — DOI: https://doi.org/10.48550/arxiv.2508.06082