Key points are not available for this paper at this time.
We introduce Motion-I2V, a novel framework for consistent and controllable text-guided image-to-video generation (I2V). In contrast to previous methods that directly learn the complicated image-to-video mapping, Motion-I2V factorizes I2V into two stages with explicit motion modeling. For the first stage, we propose a diffusion-based motion field predictor, which focuses on deducing the trajectories of the reference image's pixels. For the second stage, we propose motion-augmented temporal attention to enhance the limited 1-D temporal attention in video latent diffusion models. This module can effectively propagate reference image features to synthesized frames with the guidance of predicted trajectories from the first stage. Compared with existing methods, Motion-I2V can generate more consistent videos even in the presence of large motion and viewpoint variation. By training a sparse trajectory ControlNet for the first stage, Motion-I2V can support users to precisely control motion trajectories and motion regions with sparse trajectory and region. This offers more controllability of the I2V process than solely relying on textual instructions. Additionally, Motion-I2V's second stage naturally supports zero-shot video-to-video translation. Both qualitative and quantitative comparisons demonstrate the advantages of Motion-I2V over prior approaches in consistent and controllable image-to-video generation. Please see our project page at https://xiaoyushi97.github.io/Motion-I2V/.
Building similarity graph...
Analyzing shared references across papers
Loading...
Xiaoyu Shi
Zhaoyang Huang
Fu-Yun Wang
Tsinghua University
Chinese University of Hong Kong
Group Sense (China)
Building similarity graph...
Analyzing shared references across papers
Loading...
Shi et al. (Fri,) studied this question.
www.synapsesocial.com/papers/68e60785b6db64358759ac43 — DOI: https://doi.org/10.1145/3641519.3657497
Synapse has enriched 4 closely related papers on similar clinical questions. Consider them for comparative context: