What question did this study set out to answer?

The aim is to improve data-free model stealing methods by addressing challenges such as query costs and accuracy.

March 10, 2026Open Access

Mgct: a multi-generator collaborative training approach for data-free model stealing

Key Points

The aim is to improve data-free model stealing methods by addressing challenges such as query costs and accuracy.
Developed MGCT framework employing a two-phase optimization process
Utilized a parallelized generator architecture for collaborative training
Implemented a model distillation mechanism with enhanced experience replay
Integrated a hybrid sampling strategy for dynamic sample selection
MGCT improved cloning accuracy by 2.07% over baseline methods
Achieved 14.14% enhancement in query efficiency under budget constraints
Demonstrated improved representational capacity of training data

Abstract

Abstract Model stealing attacks in the Machine Learning as a Service (MLaaS) paradigm face multiple technical challenges, particularly in scenarios lacking training data support. Target models remain vulnerable to reverse engineering and replication under strict environmental constraints. Existing data-free model stealing methods face critical issues such as excessively high query costs, limited attack accuracy, and low sample utilization efficiency. These factors significantly undermine the practical feasibility of such attacks. In this work, we design a Multi-Generator Collaborative Training Approach for Data-Free Model Stealing (MGCT) to address these technical bottlenecks. This framework employs a two-phase optimization paradigm to achieve effective extraction of the target model: In the first phase, a parallelized generator architecture is utilized for multi-generator collaborative training, which enhances the diversity of synthetic samples and alleviates the optimization difficulty of a single generator through a load-balancing strategy. A model distillation mechanism based on an enhanced experience replay is designed in the second phase. By integrating a hybrid sampling strategy and class-balancing constraints, low-confidence samples are dynamically selected, and class distribution equilibrium is maintained, thereby significantly improving the representational capacity of the training data. Compared to existing baseline methods, experimental evaluation results demonstrate that the MGCT framework significantly improves core evaluation metrics, including clone accuracy and query efficiency. Specifically, experimental data on the CIFAR-100 dataset show that the MGCT framework improves cloning accuracy by 2.07% and achieves 14.14% improvement under constrained budgets. These experimental results demonstrate the efficacy of the proposed method in addressing the technical bottlenecks of data-free model stealing.

Mark Helpful

Bookmark

Relay

View Full Paper