What question did this study set out to answer?

This research aims to explore a method for targeted layer retention in mixed-precision quantization, enhancing compression for diffusion-transformer models.

May 29, 2026Open Access

Targeted Layer Retention for Mixed-Precision NF4 Diffusion Transformers

Key Points

This research aims to explore a method for targeted layer retention in mixed-precision quantization, enhancing compression for diffusion-transformer models.
Developed a targeted layer-retention method for mixed-precision NF4 quantization of diffusion transformers.
Retained specific architecture modules in bfloat16 while quantizing the transformer body to 4-bit NormalFloat (NF4).
Utilized public Hugging Face artifacts for text-conditioned image editing and generation tasks.
Demonstrated effective compression through strategic layer retention, improving performance in image model tasks.
Artifacts released include support for Qwen-Image-Edit variants and ERNIE-Image generation.
Findings suggest that selective quantization of layers significantly impacts model utility.

Abstract

This preprint documents a targeted layer-retention method for mixed-precision NF4 quantization of diffusion-transformer image models. The method retains architecture-specific first, last, and boundary modules in bfloat16 while quantizing the middle transformer body to 4-bit NormalFloat (NF4) with double quantization. The central claim is that useful compression depends not only on using NF4, but on selecting which layers should not be quantized. Public Hugging Face artifacts demonstrate this rule across text-conditioned image editing with Qwen-Image-Edit variants and text-to-image generation with ERNIE-Image. The first related public artifact, ovedrive/qwen-image-edit-4bit, was released on Hugging Face on 19 August 2025. This Zenodo record archives the manuscript describing the method, artifact history, deployment observations, limitations, and future benchmark directions.

Targeted Layer Retention for Mixed-Precision NF4 Diffusion Transformers

Key Points

Abstract

Cite This Study