What type of study is this?

This is a Experimental Study study.

September 29, 2025Open Access

SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation

Key Points

SRDiffusion achieves significant speed improvements over existing approaches, optimizing video generation.
The method shows more than 3× speedup for Wan with minimal loss in quality on VBench, enhancing video fidelity.
By combining large and small models, SRDiffusion maintains semantic and motion fidelity during video generation.
This framework introduces a novel approach to inference acceleration, offering a scalable solution for video tasks.

Abstract

Leveraging the diffusion transformer (DiT) architecture, models like Sora, CogVideoX and Wan have achieved remarkable progress in text-to-video, image-to-video, and video editing tasks. Despite these advances, diffusion-based video generation remains computationally intensive, especially for high-resolution, long-duration videos. Prior work accelerates its inference by skipping computation, usually at the cost of severe quality degradation. In this paper, we propose SRDiffusion, a novel framework that leverages collaboration between large and small models to reduce inference cost. The large model handles high-noise steps to ensure semantic and motion fidelity (Sketching), while the smaller model refines visual details in low-noise steps (Rendering). Experimental results demonstrate that our method outperforms existing approaches, over 3 speedup for Wan with nearly no quality loss for VBench, and 2 speedup for CogVideoX. Our method is introduced as a new direction orthogonal to existing acceleration strategies, offering a practical solution for scalable video generation.

Read Full Paperexternally

Ask AI

Helpful

Bookmark

View Full Paper