Distilled video generation models offer fast and efficient synthesis but struggle with motion customization when guided by reference videos, especially under training-free settings. Existing training-free methods, originally designed for standard diffusion models, fail to generalize due to the accelerated generative process and large denoising steps in distilled models. To address this, we propose MotionEcho, a novel training-free test-time distillation framework that enables motion customization by leveraging diffusion teacher forcing. Our approach uses high-quality, slow teacher models to guide the inference of fast student models through endpoint prediction and interpolation. To maintain efficiency, we dynamically allocate computation across timesteps according to guidance needs. Extensive experiments across various distilled video generation models and benchmark datasets demonstrate that our method significantly improves motion fidelity and generation quality while preserving high efficiency. Project page: https://euminds.github.io/motionecho/
Building similarity graph...
Analyzing shared references across papers
Loading...
Rong Jin
Shandong Normal University
Xin Xie
Fuyang Normal University
Xinyi Yu
Hong Kong University of Science and Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Jin et al. (Tue,) studied this question.
synapsesocial.com/papers/68de84bf5b556a9128e1be0a — DOI: https://doi.org/10.48550/arxiv.2506.19348
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: