What type of study is this?

This is a Experimental Study study.

October 5, 2025Open Access

Pack and Force Your Memory: Long-form and Consistent Video Generation

Key Points

MemoryPack enhances dynamic context modeling to achieve minute-level temporal consistency in video generation.
Direct Forcing reduces error propagation during inference by improving training-inference alignment in autoregressive models.
The combined approach maintains computational efficiency and linear complexity regardless of video length.
These innovations advance the practical usability of generative video models, allowing for better context consistency.

Abstract

Long-form video generation presents a dual challenge: models must capture long-range dependencies while preventing the error accumulation inherent in autoregressive decoding. To address these challenges, we make two contributions. First, for dynamic context modeling, we propose MemoryPack, a learnable context-retrieval mechanism that leverages both textual and image information as global guidance to jointly model short- and long-term dependencies, achieving minute-level temporal consistency. This design scales gracefully with video length, preserves computational efficiency, and maintains linear complexity. Second, to mitigate error accumulation, we introduce Direct Forcing, an efficient single-step approximating strategy that improves training-inference alignment and thereby curtails error propagation during inference. Together, MemoryPack and Direct Forcing substantially enhance the context consistency and reliability of long-form video generation, advancing the practical usability of autoregressive video models.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Xiaofei Wu

Shandong University of Technology

Guozhen Zhang

XU Zhi-yong

University of Electronic Science and Technology of China

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Pack and Force Your Memory: Long-form and Consistent Video Generation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider