What question did this study set out to answer?

This study aims to develop a noise-unconstrained generative model for dataset distillation that generates effective proxy datasets.

May 9, 2026

View Full Paper

Dataset Distillation via a Noise-Unconstrained Generative Model

JZJingxuan ZhangEast China University of Science and Technology LDLei DaiEast China University of Science and Technology FYFei YeXi'an University of Science and Technology

Key Points

This study aims to develop a noise-unconstrained generative model for dataset distillation that generates effective proxy datasets.
Introduced an adaptive matching coefficient for aligning generated images with representative class elements.
Extended MiniMax loss function to ease optimization challenges during the distillation stage.
Utilized gradient-matching based dataset distillation for feature ensembling in the deployment stage.
Generated images outperformed models trained on 25%, 50%, and 75% of original images by 8.4%, 6.3%, and 8.3% in distillation performance, respectively.
Demonstrated reduced generalization error using theoretical analysis based on McDiarmid's inequality.
Achieved efficacy through experimentation across 11 different benchmarks.

Abstract

Dataset distillation (DD) aims to synthesize a more compact dataset than the original one and models trained on it are expected to have the same generalization capabilities as on the original dataset. Previous work via a generative model (GM) faces several limitations. First, GM struggles to generate representative samples due to a lack of constraints. Second, it overlooks the relationships between generated samples, limiting its effectiveness. In this paper, a new noise-unconstrained GM-based DD framework is proposed. In the distillation stage, an adaptive matching coefficient is introduced to align generated images with representative class elements and the MiniMax loss function is extended to reduce the optimization difficulty. In the deployment stage, features among each generative image are ensembled by gradient-matching based DD. Theoretical analysis based on McDiarmid's inequality demonstrates that the proposed components can reduce the generalization error of the original baseline method. We also provide insights into the potential of generated images as an effective proxy dataset for DD. For example, on the ImageWoof dataset with 50 distilled images per class using a 6-layer ConvNet for evaluation, generated images outperform 25%, 50%, and 75% original images by 8.4%, 6.3%, and 8.3% in distillation performance. Our method effectively handles both low- and high-resolution datasets, with experiments on 11 benchmarks demonstrating its efficacy.

KI fragen

Bookmark

View Full Paper

KI fragen

Bookmark

View Full Paper

Dataset Distillation via a Noise-Unconstrained Generative Model

Key Points

Abstract

Cite This Study