Abstract Diffusion transformers have achieved remarkable success in image generation but incur high sampling costs. Existing caching strategies typically fix the cache ratio at a low and limited value, failing to fully exploit the acceleration potential. To address this limitation, we propose HiCache, a unified hierarchical timestep-aware caching framework. We first propose timestep block cascade learning (TBCL), which partitions the diffusion time steps hierarchically into coarse-grained parent blocks and fine-grained child blocks. This hierarchical strategy allows cache constraints to be inherited across blocks, significantly increasing the cache ratio. Based on this, we propose semantic-guided cache loss (SGCL), a semantic-aware dynamic gating mechanism. This design maintains consistency between the training and inference processes and introduces minimal additional computational overhead during inference. Experimental results demonstrate that HiCache significantly outperforms advanced diffusion samplers and previous learning-based caching methods at the same inference speed. Moreover, when using cache ratios that exceed 50%, HiCache avoids the severe performance degradation observed in previous methods.
Mei et al. (Thu,) studied this question.