Reconstructing high-quality images under low bitrate conditions has long been a challenging task. Previous studies have made this task feasible by leveraging the prior knowledge of diffusion models. However, in image compression tasks, the diffusion model baseline fails to adequately integrate advanced semantic information, and the alignment between the diffusion priors and the learning objectives of the compressor is also lacking. To address this issue, we propose the Diffusion Prior Refinement for Efficient Low-rate Image Compression (DiRIC), an image compression scheme based on Stable Diffusion. DiRIC can efficiently encode low-level image information and achieve a highly realistic reconstruction of the original image by leveraging high-level semantic features and the prior knowledge inherent in diffusion models. Specifically, DiRIC employs a multi-feature compressor to represent crucial low-level information with ex tremely low bitrates; meanwhile, it acquires more robust hy brid semantics through a pre-embedding mechanism, providing abundant contextual support for the decoding end. Furthermore, we design a consistency skip module to enhance and refine the diffusion prior. To further improve decoding efficiency, we employ a noise-level estimator to reduce the number of sampling steps, aiming to achieve high-fidelity and efficient decoding. Extensive experimental results show that this method not only achieves the current state-of-the-art perceptual fidelity but also significantly outperforms previous perceptual image compression methods in terms of statistical fidelity. In comparison to SoTA diffusion baselines 1, we have achieved a 147.44% and 84.63% BD Rate improvement in terms of FID and PSNR, alongside a 19× increase in decoding speed.
Xia et al. (Thu,) studied this question.